TL;DR. I fed one month of real trading-bot failure logs to four models. Gemma 4 31B. Gemini 3.1 Pro. DeepSeek V4 Pro. And Gemma 4 wrapped in a self-validation loop. Raw Gemma 4 caught 6 of the 8 structural issues a closed-model baseline found. At 1/170th the price. Wrapping Gemma 4 in a Generator → Critic → Synthesizer harness didn't add new findings. It sharpened the ones the model already had. The break-even win-rate estimate moved from a naïve 50% to a defensible 64%. The gap between open and closed models on analytical tasks isn't about raw capability anymore. It's about harness design. Why I ran this comparison For the past month I've been operating WILD_SNIPER V3.7.1. A small spot-trading bot on Binance USDT pairs. It's a one-developer hobby project. ccxt-based. REST polling. GRID-style entries on volume + price-drop triggers. ATR stop-loss. Trailing exit. Position size $6.50. Real money. Small money. On 2026-05-12 I shut it down. Cumulative PnL over 27 hours of live trading: -$1.93 . Not a disaster.…