Menu

Post image 1
Post image 2
1 / 2
0

How we measured 99.6% token reduction across 15 task-runs

DEV Community·J. Gravelle·about 1 month ago
#D6hnMqHQ
#how#mcp#claude#jcodemunch#tokens#benchmark
Reading 0:00
15s threshold

Two months after publishing the headline, here are the receipts. Two months ago I published "Your AI Agent Is Dumpster Diving Through Your Code." The most common reply was some flavor of: "Cool numbers, but how did you actually measure them?" Fair question. Here's the answer. What we measured The jCodeMunch benchmark measures retrieval token efficiency — how many LLM input tokens a code-exploration tool consumes compared to reading all source files. It does not measure answer quality, latency, or end-to-end task completion. Those are separate axes (we measure precision separately in jMunchWorkbench, but that's a different post). Three repos, five queries, run on 2026-03-28: Repository Files Symbols Baseline tokens expressjs/express 165 181 137,978 fastapi/fastapi 951 5,325 699,425 gin-gonic/gin 98 1,489 187,018 The five queries cover the most common code-exploration intents I see in the wild: router route handler , middleware , error exception , request response , context bind .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More