Meta Llama 4 Scout & Maverick — The Complete Production Guide: 17B Active MoE, 10M Context, iRoPE, and the vLLM/Ollama Deployment Playbook Published on the ManoIT Tech Blog (Korean original). On April 5, 2026, Meta released Llama 4 Scout and Llama 4 Maverick — the first open-weights models in the Llama family to use a Mixture-of-Experts (MoE) architecture, the first to be natively multimodal , and — with Scout — the first to deliver a real 10M-token context window that runs on a single H100 GPU. This post unpacks the architecture, benchmarks, license caveats, deployment options, Llama Guard 4 safety stack, and the Behemoth delay — from a production-deployment perspective grounded in what ManoIT recommends to customers. 1. The Llama 4 family at a glance — Scout / Maverick / Behemoth Llama 4 was always planned as a three-model family . April 5 brought two models; the largest one, Behemoth, is still in private training. Knowing where each model fits is the first decision in any adoption review.…