The AI tool wars have devolved into horoscope readings with syntax highlighting. People aren't comparing benchmarks. They're defending identities. I have seen a similar developer commenting, "Claude is the best choice for React work" whereas another comment says, "Claude cannot be used for React, Codex outperforms it" Same person, same React framework, polar opposite impressions. Both are convinced. Neither has a controlled process. The Unfalsifiability Problem Here's what makes this astrology and not science: nobody controls for anything. The outcomes you achieve using Claude are influenced by how you phrase your prompts, the complexity of your project, the programming language you use, the version of the library, and let’s be real – how you feel when you assess the results. And the same goes for Codex. The same goes for Gemini. The performance of a model changes significantly depending on the language and framework used.…