Coding agents such as Cortex Code, Claude Code, Codex, and Cursor rely on large language models (LLMs) behind the scenes. A common question from users is: “Why does my first turn consume so many input tokens when I only typed a short prompt?” This post explains how prompt caching works in these systems, why the first turn often looks expensive, and why cache hit rates usually improve as a session continues. Key point : Coding agents like Cortex Code benefit from the same general prompt-caching principles described by Anthropic and OpenAI. Understanding those mechanics helps you interpret token usage more accurately. 1. Why the First Turn Can Look Expensive 1-1. Why users notice high input token usage on the first turn When you start a new session in a coding agent and type something simple like “fix the typo in line 3,” the API usage may show thousands of input tokens — far more than your short message.…