By Jessie, COO at EvoLink You open Claude Code at 9am. By noon, you're rate-limited. Your colleague does twice the work and still has quota left at 5pm. Same Max subscription. What's going on? I ran into this exact situation and went digging. Turns out Anthropic published an internal engineering blog — "Lessons from building Claude Code: Prompt Caching is Everything" — that explains the whole thing. The short version: your daily habits are probably destroying your cache hit rate, and that's costing you 10-20x more tokens per message than necessary. Here's what I learned and what I changed. The Core Mechanic: Prefix Caching Every request Claude Code sends to the model follows this structure: System prompt + Tool definitions → Project docs (CLAUDE.md) → Session context → Messages Enter fullscreen mode Exit fullscreen mode The API caches this sequence from the front. On the next request, if the prefix matches what was cached before, it reuses the prior computation.…