A month ago I switched my Claude Code setup so that the terminal calls go through a local Ollama model instead of hitting Anthropic. The claim I made publicly at the time was that this cuts Claude Code spending by roughly 90%. A few people in the comments asked, very fairly, "where does that number come from?" So I started tracking. Every Claude Code session for the next 30 days got logged: what kind of task it was, which engine handled it, how long it took, and — for the Anthropic-side calls — what it cost in tokens at published per-million pricing. I also rated quality 1 to 5 on every task: did the model actually do the job, or did I have to bounce it back to Sonnet? The takeaway is more nuanced than the headline. The 90% number basically held, but not because Gemma is "as good as Sonnet" — it isn't. The savings are real because a surprising fraction of what you actually ask Claude Code to do is mechanical, and mechanical work doesn't need a frontier model. Here's the breakdown.…