Menu

Post image 1
Post image 2
1 / 2
0

I Tested 5 AI Coding Tools on Real Work. Here Are the Results.

DEV Community·yan yan·19 days ago
#zFP0TV9x
#task#github#cursor#claude#code#error
Reading 0:00
15s threshold

I Tested 5 AI Coding Tools on Real Work. Here Are the Results. I gave Copilot, Cursor, Claude Code, Windsurf, and Aider the same 3 real tasks. The results were not even close. AI coding tools are everywhere. GitHub Copilot. Cursor. Claude Code. Windsurf. Aider. Every week there is a new one, and every review says "this tool changed my life." I don't trust those reviews. Most test on toy problems — a todo app, sorting an array, fetching from an API. That is not how real software works. So I designed a real-world benchmark. Three tasks pulled from my actual work. Not contrived. Not simplified. The same mess you deal with every day. Here are the results. The Test Setup The tasks: Legacy refactor : A 400-line Python script with no tests, no types, and a known bug. Add type hints, write tests, and fix the bug without breaking anything else. Greenfield feature : Build a real-time data pipeline with WebSocket ingestion, transformation, and PostgreSQL writes.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Read More