I ran a simple VibeCode Arena duel, voted confidently based on what I saw, and then watched the evaluation metrics expose how little of the important stuff I had actually checked. I nearly trusted an AI-generated app this week, much faster than I should have, and what bothered me afterward was how normal that trust felt in the moment. I opened the preview, clicked around, saw that everything responded the way it was supposed to, and within less than a minute, I had already mentally filed it under yeah, this seems usable. There was no hesitation there. No deeper inspection. Just a quick visual pass and a growing sense that the output was probably fine. That confidence lasted until the evaluation metrics loaded and made it painfully obvious that I had reviewed the wallpaper while ignoring the foundation. This came from a Duel I ran on VibeCode Arena. Same prompt, two models generating side by side, blind vote before the platform reveals what is actually going on under the hood.…