If you've built a multi-turn AI agent and watched it degrade over long task chains, becoming repetitive, losing the thread, producing inconsistent outputs 20 turns in, you've probably blamed the context window, the system prompt, or the base model quality. There's a more fundamental cause, and a January 2026 preprint describes it with enough precision to change how you think about the problem. The Paper AT²PO: Agentic Turn-based Policy Optimization via Tree Search identifies three structural failure modes in multi-turn agentic systems trained with reinforcement learning. Failure mode 1: Exploration diversity collapses. Over extended task chains, RL-trained agents converge toward a narrow set of behaviors. They stop genuinely exploring and start repeating. The model is technically "trying different things," but the actual diversity of strategies drops off as training progresses.…