INQUIRING LINE

Can budget-tightening curricula improve reasoning efficiency more than fixed budgets?

This explores whether training a model on a schedule of shrinking token budgets (generous first, then progressively tighter) buys better reasoning efficiency than just training under one fixed budget — and why that staging helps.


This explores whether a budget-tightening curriculum beats a fixed budget for reasoning efficiency. The corpus answers directly: yes, and the reason is that learning to reason well and learning to reason cheaply are two different jobs. Models trained with progressively tightening token budgets reach higher accuracy *and* better token efficiency than fixed-budget baselines, because the curriculum splits training into an exploration phase (discover strategies while budgets are generous) and a compression phase (distill those strategies once the budget clamps down) — see Does gradually tightening token budgets beat fixed budget training?. A fixed budget forces both jobs to happen at once, and that's the disadvantage.

Why does compressing late work at all? Because more thinking is not free upside. Accuracy is non-monotonic in thinking length: pushing one model from ~1,100 to ~16K thinking tokens dropped accuracy from 87.3% to 70.3%, as it overthought easy problems and underthought hard ones Does more thinking time always improve reasoning accuracy?. So there's genuine slack to cut — a tightening curriculum is exploiting the fact that the generous-budget version was partly wasting tokens, not using them.

The more interesting question is whether the efficiency comes from the *budget schedule itself* or from training structure more broadly — and the corpus leans toward the latter. Reasoning models keep beating non-reasoning ones at any inference budget because training installs a protocol that makes extra tokens productive; the gap is about how reasoning was trained in, not raw compute at deploy time Can non-reasoning models catch up with more compute?. In the same spirit, RL training flips extended thinking from counterproductive self-doubt into useful gap-analysis — training mediates the *quality* of reasoning, not just its quantity Does extended thinking help or hurt model reasoning?. A budget curriculum is one lever within that broader truth: it's shaping when and how the model learns to spend, not adding capability.

There's a cheaper rival worth knowing about. If you only want brevity, you may not need a curriculum — or any retraining — at all. Verbose versus concise chains of thought turn out to occupy distinct linear regions of activation space, and a single steering vector extracted from 50 examples cut chain-of-thought length 67% with a 2.73x speedup and no accuracy loss Can we steer reasoning toward brevity without retraining?. That reframes the original question: a tightening curriculum earns its cost when you want the model to genuinely *learn* a more efficient reasoning policy, whereas inference-time steering buys compression off the shelf when you just want shorter output now.

One caution the corpus adds: efficiency gains measured on final accuracy can hide reasoning damage. Supervised fine-tuning raised benchmark scores while cutting the quality of intermediate inferential steps by 38.9%, producing right answers via post-hoc rationalization that standard metrics miss Does supervised fine-tuning improve reasoning or just answers?. So if you adopt budget-tightening, the success test isn't just "same accuracy, fewer tokens" — it's whether the compressed reasoning is still doing real inferential work underneath.


Sources 6 notes

Does gradually tightening token budgets beat fixed budget training?

Models trained with progressively tightening token budgets consistently achieve higher accuracy and better token efficiency than fixed-budget baselines. The approach works by separating learning into exploration (discovering strategies with generous budgets) and compression (distilling them under constraints).

Does more thinking time always improve reasoning accuracy?

Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.

Can non-reasoning models catch up with more compute?

Reasoning models persistently outperform non-reasoning models regardless of inference budget because training instills a reasoning protocol that makes additional tokens productive. The gap is fundamentally about deployment mechanisms and training structure, not raw capability.

Does extended thinking help or hurt model reasoning?

Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Does supervised fine-tuning improve reasoning or just answers?

Supervised fine-tuning improves final-answer accuracy on benchmarks but cuts Information Gain by 38.9 percent, meaning models generate correct answers through post-hoc rationalization rather than genuine inferential steps. Standard metrics miss this degradation because they only measure final correctness.

Next inquiring lines