This week I shipped a benchmark for code-intelligence MCP servers and posted the results — including the cases where my own tool lost. Within 36 hours, the maintainer of one of the competing tools (jcodemunch-mcp) had shipped three back-to-back releases addressing specific findings the benchmark exposed. Adding new tests for those fixes then exposed a symmetric blind spot in my own parser. I shipped a fix. That whole loop — competing maintainers iterating on the same eval, in opposite directions, in 36 hours — is what a public benchmark is supposed to do. It almost never does, and I think most of the time it's because the benchmark lives in the wrong place. So I moved mine.…