Owen Posted on May 13 • Originally published at ofox.ai TL;DR — The root cause of Claude Code expenses isn't model cost but repeated context transmission, defaulting to Opus, and uncapped extended thinking. Combining prompt caching (cached tokens cost 90% less), model tiering (Haiku for simple tasks, Sonnet for standard work, Opus for complex problems), context hygiene (lean CLAUDE.md + /compact + skills), thinking budget controls, and hooks preprocessing plus sub-agent delegation can reduce bills to 10-40% of original costs. Why Claude Code Overspending Happens Claude Code charges by token, transmitting CLAUDE.md, MCP tool definitions, conversation history, and file read results to Sonnet 4.6 or Opus 4.7 each interaction: Enterprise deployment averages $13 per developer per active day , $150-250 monthly Token distribution analysis shows "70%-90% of input tokens come from repeated system prompts, CLAUDE.md, and file history" Opus 4.7 costs $5/MTok input and $25/MTok output; Sonnet 4.6 is $3/$15; Haiku 4.5 is…