Composition, retrieval fidelity, and the failure modes between $vectorSearch and a cited answer. The hard part of a production RAG system is not running a single $vectorSearch. It is composing keyword retrieval, vector retrieval, ranking fusion, reranking, and a generative model into a pipeline that returns the right passage, with the right confidence, every time, under concurrency, across tenants, and with predictable latency. In MongoDB Atlas, this composition can be done end-to-end inside a single cluster. Operational data, full-text indexing, vector indexing, multi-tenant filtering, and the retrieval that feeds an LLM are not separate stores. They are stages of the same aggregation framework. This text walks through the architectural decisions behind a complete RAG knowledge base built on Atlas Search and Atlas Vector Search, examining where each stage contributes, where each stage fails silently, and what it takes to keep the pipeline honest end-to-end.…