How I Built a KV-Cache Control Plane for LLM Inference — With Real Benchmark Results LLM inference is expensive. The prefill step — processing the prompt — is the biggest cost. If you've seen the same prompt before, you shouldn't have to recompute it. That's the core idea behind KV-cache reuse. But in a distributed system with multiple inference nodes, a new problem emerges: where is the cached prefix stored, and how do you route requests to maximize reuse? I built llm-serving-cache to answer that question — a metadata-driven control plane for LLM KV-cache placement and routing. The Problem In a single-node setup, KV-cache reuse is straightforward. The cache is local and the router is trivial.…