Menu

Post image 1
Post image 2
Post image 3
Post image 4
1 / 4
0

Vector Index Cold Start: Why Your First Query Takes 8 Seconds

DEV Community·Gabriel Anhaia·25 days ago
#slebdIhU
#pattern#rag#ai#warm#query#first
Reading 0:00
15s threshold

Book: RAG Pocket Guide: Retrieval, Chunking, and Reranking Patterns for Production Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub You ship the RAG service. Tests are green, p99 retrieval is around 40 ms, the dashboard looks healthy. Then a deploy lands at 03:00 UTC, the pod restarts, and the first user query takes eight seconds. The second one takes 38 ms. Same query, same index, same code path. The graph in Grafana shows a single sharp spike on every rolling deploy, and your on-call shrugs and says it is fine because the spike never lasts. It is not fine. The spike is the index reading itself off disk one page at a time while the user sits at a loading spinner. On a quiet morning that is one frustrated user.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More