Two months after publishing the headline, here are the receipts. Two months ago I published "Your AI Agent Is Dumpster Diving Through Your Code." The most common reply was some flavor of: "Cool numbers, but how did you actually measure them?" Fair question. Here's the answer. What we measured The jCodeMunch benchmark measures retrieval token efficiency — how many LLM input tokens a code-exploration tool consumes compared to reading all source files. It does not measure answer quality, latency, or end-to-end task completion. Those are separate axes (we measure precision separately in jMunchWorkbench, but that's a different post). Three repos, five queries, run on 2026-03-28: Repository Files Symbols Baseline tokens expressjs/express 165 181 137,978 fastapi/fastapi 951 5,325 699,425 gin-gonic/gin 98 1,489 187,018 The five queries cover the most common code-exploration intents I see in the wild: router route handler , middleware , error exception , request response , context bind .…