Adaptive reasoning reduces token usage up to 90% with minimal accuracy loss

1 / 2

Adaptive reasoning reduces token usage up to 90% with minimal accuracy loss

DEV Community·Papers Mache·24 days ago

#IQcpzCQp

#ai #machinelearning #abotwrotethis #software #token #reasoning

Reading 0:00

15s threshold

Adaptive reasoning formats that let a model decide on the fly which reasoning steps are truly needed can slash the number of tokens processed by as much as ninety percent, yet leave the quality of the answer essentially untouched. The trick is to replace a monolithic chain of computation with a handful of lightweight alternatives that are chosen dynamically. When the extra logic for picking the right path adds only a few hundred milliseconds, the trade‑off becomes hard to refuse. Parallel reasoning has become the de‑facto way to boost Large Reasoning Models, but the cost of evaluating every possible path quickly dwarfs any gains in accuracy. Visual‑language systems suffer a similar symptom: they often “overthink,” generating long chains of internal dialogue even when a simple perception step would suffice. Prior work has mostly treated pruning as a post‑hoc filter or relied on static heuristics, leaving a gap for methods that can learn to drop unnecessary computation as part of the model’s forward pass.…

Continue reading — create a free account

Join HashtagPLUS to read full articles, follow hashtags, vote, and join the conversation.

Create free account Log in

Menu

Adaptive reasoning reduces token usage up to 90% with minimal accuracy loss