The scaling-is-everything story has a new challenger. On May 6, 2026, Zyphra released ZAYA1-8B — an open-weight Mixture-of-Experts reasoning model with 8.4 billion total parameters and fewer than 800 million active per token. On AIME 2025, a benchmark where DeepSeek-R1 sits at 87.5 with its 671 billion parameter footprint, ZAYA1-8B scores 91.9. That gap in parameter count — roughly two orders of magnitude — is the headline, but the engineering story underneath is more interesting. This guide covers the architecture, the novel test-time compute method, how to run the model today, and what the benchmark numbers actually mean in practice. Why This Model Matters Right Now The reasoning model landscape in 2026 has sorted into two camps: closed frontier models (GPT-5, Claude 4.5 Sonnet, Gemini 2.5 Pro) that require API calls with unpredictable pricing, and large open-weight models (DeepSeek-R1-0528 at 671B, Llama 4 Maverick at 400B+) that technically run locally but demand multi-GPU clusters to do so practically.…