I had a clean theory about how RAG caches silently corrupt your answers. Then I built one, ran the numbers, and the actual failure mode was the opposite of what I expected. Let's face it Most of us building RAG pipelines treat the LLM call as atomic. Query comes in, embed, retrieve, generate, return. If we cache anything, we slap a Redis in front of it, key by the query string, call it a day, and move on to the next ticket. Then someone tells you "we should use semantic caching, embeddings will catch the paraphrases" and you nod, set a similarity threshold of 0.95 because that sounds reasonable, and ship it. Last weekend I decided to actually build the thing end-to-end and see what shakes out. The setup: a public RAG chatbot trained on the docs of three RESP-compatible databases - Valkey, Redis, and Dragonfly - with full caching infrastructure underneath and live observability on every turn. The whole thing is at chat.betterdb.com and it shows you live data for each query, as well as the aggregates.…