Menu

Post image 1
Post image 2
1 / 2
0

I benchmarked code retrieval for AI coding agents on 60 tasks

DEV Community·Nikita Groshin·about 1 month ago
#RpKkkRmG
#why#ai#opensource#programming#sverklo#grep
Reading 0:00
15s threshold

A tuned grep beat my MCP code-intelligence server on F1 by 9 points. I'm publishing the result anyway. Here's why. Why this benchmark exists I've spent the last six months building sverklo , a local-first MCP server that gives AI coding agents (Claude Code, Cursor, Windsurf) a real symbol graph instead of grep-based pattern matching. The product positioning has always been "stops the agent from hallucinating function names that don't exist in your codebase." That positioning is hand-wavy without numbers. Six months in, I had no public benchmark. Whatever speed-of-iteration story I told myself was, I was telling myself. So I built one: 60 hand-verified retrieval tasks across two real OSS codebases ( expressjs/express and the sverklo repo itself), three baselines (naive grep, smart grep, sverklo), and metrics that measure both retrieval quality (F1, recall, precision) and the thing AI agents actually pay for (input tokens, tool calls, wall time). Results live at sverklo.com/bench .…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More