Prompt Caching With the Claude API: A Practical Guide

1 / 2

Prompt Caching With the Claude API: A Practical Guide

DEV Community·GDS K S·about 1 month ago

#4hUNSD7e

#ai #anthropic #api #performance #cache #prompt

Reading 0:00

15s threshold

I noticed a pattern looking at three months of Anthropic invoices. The same 8 KB system prompt was getting billed full price on every request. Same instructions, same tool definitions, same RAG context, charged again every turn. The fix takes about ten lines of code and cuts the input bill by roughly 90 percent on cached tokens. This guide is the version of that fix I wish I had bookmarked a year ago. TL;DR Question Short answer What does it do? Stores a prefix of your prompt server-side so later requests skip re-encoding it How much does it save? Cache reads cost 10 percent of the base input rate, so up to 90 percent off cached tokens What does it cost to write? First write costs 1.25x base input (5-minute TTL) or 2x (1-hour TTL) When does it pay off? Any prefix reused at least twice within the TTL window Smallest cacheable chunk? 1024 tokens for Sonnet and Opus, 2048 tokens for Haiku Where do I put the marker? On the last block of the chunk you want cached 1.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Prompt Caching With the Claude API: A Practical Guide