Originally published on NextFuture Between May 11 and May 13, 2026, nine separate engineering blogs, dev.to writeups, and arXiv benchmarks shipped specific evidence about how AI coding agents break in production. The pieces cite real numbers: Works With Agents round two scored Claude Sonnet 4 at 85.0 percent while SmolLM3 3B hit 93.3, a 10 Security Mistakes writeup documented agent loops doing 30 wrong commits and 100 deleted database rows in a single bad run, and a 1.5-year Cursor-vs-Claude-Code-vs-Codex retrospective put the rotation cost in the "hundreds of dollars" bucket per developer. None of these sources reads the others. This post does the aggregation so the failure taxonomy fits on one page.…