I Tested 28 Query Pairs to See if Semantic Caches Actually Lie to Users. The Result Surprised Me

1 / 2

I Tested 28 Query Pairs to See if Semantic Caches Actually Lie to Users. The Result Surprised Me

DEV Community·Kristian Ivanov·about 1 month ago

#hXlyexHf

#ai #rag #cache #tier #valkey #threshold

Reading 0:00

15s threshold

I had a clean theory about how RAG caches silently corrupt your answers. Then I built one, ran the numbers, and the actual failure mode was the opposite of what I expected. Let's face it Most of us building RAG pipelines treat the LLM call as atomic. Query comes in, embed, retrieve, generate, return. If we cache anything, we slap a Redis in front of it, key by the query string, call it a day, and move on to the next ticket. Then someone tells you "we should use semantic caching, embeddings will catch the paraphrases" and you nod, set a similarity threshold of 0.95 because that sounds reasonable, and ship it. Last weekend I decided to actually build the thing end-to-end and see what shakes out. The setup: a public RAG chatbot trained on the docs of three RESP-compatible databases - Valkey, Redis, and Dragonfly - with full caching infrastructure underneath and live observability on every turn. The whole thing is at chat.betterdb.com and it shows you live data for each query, as well as the aggregates.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

I Tested 28 Query Pairs to See if Semantic Caches Actually Lie to Users. The Result Surprised Me