Caching Pre-Computed Embeddings: TTL, Versioning, and the Cold-Start Problem

1 / 3

Caching Pre-Computed Embeddings: TTL, Versioning, and the Cold-Start Problem

DEV Community·Gabriel Anhaia·27 days ago

#9tshEZ9M

#rag #ai #cache #embedding #model #vector

Reading 0:00

15s threshold

Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub You ship a RAG system. The first cost report lands. Embedding spend is a third of your bill, and most of it is paying to re-embed the same chunks you already embedded yesterday. The corpus barely moves. The user query distribution is heavy on the same handful of intents. Every webhook that touches a document re-embeds it from scratch. Every nightly index rebuild ignores the previous run's vectors and asks the vendor for new ones. The fix is obvious: cache the embeddings. Look up by content, return the stored vector if it exists, call the API only when it does not. Three lines of pseudocode. Then the questions show up. What if the source document changes?…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Caching Pre-Computed Embeddings: TTL, Versioning, and the Cold-Start Problem