I’ve tested enough local models at this point to stop trusting benchmark charts. Most of them look impressive until you actually give them a real project. Then things fall apart: context gets messy reasoning becomes inconsistent responses drift code suggestions start contradicting earlier answers So when Google released the Gemma 4 models, I wasn’t expecting much beyond another benchmark-heavy launch. But after spending a few days testing the 26B MoE model locally, I think this is the first open Mixture-of-Experts model that actually feels stable enough for real development work. Not perfect. But noticeably different. My Test Was Simple Instead of synthetic prompts, I used an actual Rails codebase I work on regularly. I fed the model: Sidekiq workers service objects serializers migrations API integrations ActiveRecord scopes some old messy business logic I never cleaned up Around 40+ files total. This is usually where smaller or poorly optimized models start losing track of relationships between files.…