Book: Prompt Engineering Pocket Guide: Techniques for Getting the Most from LLMs Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools Me: xgabriel.com | GitHub A team I talked to last month flipped on Anthropic's prompt caching for their RAG endpoint. Sixty thousand tokens of system prompt, tool definitions, and pinned reference docs in front of every user message. The dashboards promised a flat 90% off the input bill. Synthetic load tests confirmed it. Production day one came back at a 1% discount, with only the timestamp differing between staging and prod. Their system prompt opened with f"Today is {datetime.now().date()}. ..." . One token of date. Zero cache hits, all day. Anthropic's caching keys on the exact prefix you sent. A single byte of drift throws the whole hash away. That is the caveat.…