Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub You've shipped a RAG service. It works. Then imagine OpenAI ships a hypothetical text-embedding-next , your eval rig says it picks up 4 points of recall on the corpus you actually care about, and your VP wants the upgrade live by Friday. The naive plan is: stop ingest, drop the old vectors, re-embed the corpus, write the new vectors back, resume traffic. On a small corpus this takes minutes. On 8 million chunks it takes a weekend, and your support bot returns nothing useful while the swap is mid-flight. The right shape is a dual-write re-index . Writes land in both the old and the new index from the moment you start. Reads keep going to the old one.…