I Almost Cancelled Claude: I Ran My Own Benchmarks Before Pulling the Trigger

📰

I Almost Cancelled Claude: I Ran My Own Benchmarks Before Pulling the Trigger

DEV Community·Juan Torchia·about 1 month ago

#english #typescript #claudecode #llm #cases #degradation

Reading 0:00

15s threshold

I Almost Cancelled Claude: I Ran My Own Benchmarks Before Pulling the Trigger I was reviewing a PR from my team on Tuesday afternoon when I caught the Hacker News thread. "I cancelled Claude" — 874 points, 400+ comments, the kind of conversation that explodes because it puts words to something a lot of people had been feeling but hadn't articulated. I read the whole thing. Then I closed the tab and opened my own logs. I've had Claude Code running against the same set of test cases since March. Not an academic benchmark — these are the real scenarios I throw at it in my actual workflow: TypeScript module refactoring, SQL migration generation, code path analysis in my monorepo on Railway. If there's degradation, my logs have it. And if they don't, then the HN thread is mostly emotional noise. Spoiler: the degradation is real. Just not where most people are complaining. Claude Quality Degradation in 2025: What My Logs Say vs. What HN Says My tracking setup is simple.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

I Almost Cancelled Claude: I Ran My Own Benchmarks Before Pulling the Trigger