The Prompt Tax Most LLM Teams Are Silently Paying

1 / 2

The Prompt Tax Most LLM Teams Are Silently Paying

DEV Community·Parag Darade·about 1 month ago

#iH7aYzTv

#ai #llm #rag #prompt #percent #tokens

Reading 0:00

15s threshold

The Prompt Tax Most LLM Teams Are Silently Paying Anthropic shipped prompt caching in August 2024. Nearly two years later, Datadog's State of AI Engineering report found that only 28 percent of LLM API calls across their observed production deployments show cached-read tokens — despite the fact that 69 percent of all input tokens in those same deployments live in system prompts. The math is not subtle: most teams are sending the same fifty thousand tokens on every request and paying full rate for all of them. This is not an obscure optimization from a recent release. Both Anthropic and OpenAI have had prompt caching available for over a year. OpenAI applies it automatically on GPT-4o calls longer than 1,024 tokens, at a 50 percent discount, requiring zero code changes. Anthropic's implementation requires marking your cache breakpoints explicitly, but the discount is steeper: cache reads on Claude Sonnet cost $0.30 per million tokens versus $3.00 per million for fresh input — a 90 percent reduction.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

The Prompt Tax Most LLM Teams Are Silently Paying