Why the First Turn in a Coding Agent Can Use So Many Input Tokens — and Why That Gets Better Over…

1 / 2

Why the First Turn in a Coding Agent Can Use So Many Input Tokens — and Why That Gets Better Over Time

DEV Community·Sho Tanaka (tsho)·about 1 month ago

#xVO5UCUB

#ai #snowflake #claude #openai #prompt #cache

Reading 0:00

15s threshold

Coding agents such as Cortex Code, Claude Code, Codex, and Cursor rely on large language models (LLMs) behind the scenes. A common question from users is: “Why does my first turn consume so many input tokens when I only typed a short prompt?” This post explains how prompt caching works in these systems, why the first turn often looks expensive, and why cache hit rates usually improve as a session continues. Key point : Coding agents like Cortex Code benefit from the same general prompt-caching principles described by Anthropic and OpenAI. Understanding those mechanics helps you interpret token usage more accurately. 1. Why the First Turn Can Look Expensive 1-1. Why users notice high input token usage on the first turn When you start a new session in a coding agent and type something simple like “fix the typo in line 3,” the API usage may show thousands of input tokens — far more than your short message.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Why the First Turn in a Coding Agent Can Use So Many Input Tokens — and Why That Gets Better Over Time