A hidden behavior makes Claude Haiku 4.5 cost five times less than Opus 4.7. GPT-5 mini runs at one-seventh the price of GPT-5.2. And Gemini 3.1 Flash-Lite? Cents per million tokens, real-time inference. In 2026, if you use AI, you probably use one of these small models. There's near-certainty it exists thanks to a technique called distillation . A big expensive model generates thousands of responses. A smaller one learns to imitate them. Your bill drops by an order of magnitude. That part wasn't supposed to be a problem. TL;DR: Anthropic just co-published a paper in Nature with UC Berkeley and Truthful AI. When a small model learns by imitating a big one , it doesn't only copy answers. Something else transits. A behavioral signature that filters miss and researchers can't fully explain. The model you use has a training history you'll never read. Anthropic spent February 2026 publicly accusing DeepSeek, Moonshot, and MiniMax of distilling Claude through thousands of fraudulent accounts.…