Vector Index Cold Start: Why Your First Query Takes 8 Seconds

1 / 4

Vector Index Cold Start: Why Your First Query Takes 8 Seconds

DEV Community·Gabriel Anhaia·25 days ago

#slebdIhU

#pattern #rag #ai #warm #query #first

Reading 0:00

15s threshold

Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub You ship the RAG service. Tests are green, p99 retrieval is around 40 ms, the dashboard looks healthy. Then a deploy lands at 03:00 UTC, the pod restarts, and the first user query takes eight seconds. The second one takes 38 ms. Same query, same index, same code path. The graph in Grafana shows a single sharp spike on every rolling deploy, and your on-call shrugs and says it is fine because the spike never lasts. It is not fine. The spike is the index reading itself off disk one page at a time while the user sits at a loading spinner. On a quiet morning that is one frustrated user.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Vector Index Cold Start: Why Your First Query Takes 8 Seconds