Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub You ship the RAG service. Tests are green, p99 retrieval is around 40 ms, the dashboard looks healthy. Then a deploy lands at 03:00 UTC, the pod restarts, and the first user query takes eight seconds. The second one takes 38 ms. Same query, same index, same code path. The graph in Grafana shows a single sharp spike on every rolling deploy, and your on-call shrugs and says it is fine because the spike never lasts. It is not fine. The spike is the index reading itself off disk one page at a time while the user sits at a loading spinner. On a quiet morning that is one frustrated user.…