Two hard problems in production AI: Accuracy : RAG systems giving wrong answers 48% of the time Cost : LLM API bills hitting $47K/month We solved both. Here's how. Part 1: RAG Accuracy (52% → 89%) Our RAG system was confidently wrong. Users asked "What were Q2 healthcare results?" and got Q1 data, footnotes, and chapter titles with zero content. High similarity scores. Completely useless context. The LLM wasn't the problem. Retrieval was broken. The 6-Stage Pipeline Stage 1: Query Processing Problem: "Show me Q2 results" has no semantic information. Solution: Query expansion + metadata extraction def process_query ( raw_query : str ) -> ProcessedQuery : metadata = extract_metadata ( raw_query ) # dates, entities expanded = expand_query ( raw_query , metadata ) embedding = embed_with_context ( expanded , metadata ) return ProcessedQuery ( expanded , metadata , embedding ) Enter fullscreen mode Exit fullscreen mode Transformation: Input: "Show me Q2 results" Output: "quarterly financial results Q2 2024…