Why hybrid search outperforms pure lexical or dense retrieval in production First-stage architecture: fusing vector similarity with BM25 and metadata filters Reranking: cross-encoders, MonoT5 and late-interaction models that raise precision Recall engineering: document expansion, query augmentation and fusion tactics that recover missed hits Practical checklist and step-by-step playbook for low-latency RAG retrieval Hybrid retrieval — the pragmatic marriage of keyword matching and semantic vectors — is the engineering pattern that actually lets RAG systems hit both high recall and strict latency SLAs in production. Getting this right means thinking in stages : filter aggressively, retrieve broadly, then rerank carefully. The symptom is familiar: queries look good in isolation but fail for hard cases — rare named entities disappear, filters (date, tenant, jurisdiction) cause noisy results, and an expensive cross-encoder reranker kills your SLA whenever traffic spikes.…