Adaptive reasoning formats that let a model decide on the fly which reasoning steps are truly needed can slash the number of tokens processed by as much as ninety percent, yet leave the quality of the answer essentially untouched. The trick is to replace a monolithic chain of computation with a handful of lightweight alternatives that are chosen dynamically. When the extra logic for picking the right path adds only a few hundred milliseconds, the trade‑off becomes hard to refuse. Parallel reasoning has become the de‑facto way to boost Large Reasoning Models, but the cost of evaluating every possible path quickly dwarfs any gains in accuracy. Visual‑language systems suffer a similar symptom: they often “overthink,” generating long chains of internal dialogue even when a simple perception step would suffice. Prior work has mostly treated pruning as a post‑hoc filter or relied on static heuristics, leaving a gap for methods that can learn to drop unnecessary computation as part of the model’s forward pass.…