Can models learn when to think versus answer directly?

This explores whether a model can decide on its own when a problem needs step-by-step reasoning versus a quick direct answer — and what that routing decision is actually selecting between.

This explores whether a model can decide on its own when a problem needs step-by-step reasoning versus a quick direct answer. The corpus says yes — and the most direct evidence is Thinkless, which trains one model to route between extended thinking and concise responses using a decoupled reinforcement learning scheme that separates the *mode choice* from the *answer quality*. The trick matters: when you optimize both at once, models collapse into always-think or never-think. Decoupling lets the model self-calibrate when to spend the extra compute, with no hand-labeled difficulty tags Can models learn when to think versus respond quickly?.

But step back and the more interesting question is what "thinking" even is here. One thread of the collection argues the reasoning capability isn't being *built* by training — it's already latent in the base model, and training just learns to elicit it. Five independent methods all surface reasoning that was already sitting in base-model activations, suggesting the real bottleneck is selection, not acquisition Do base models already contain hidden reasoning ability?. If that's true, then "learning when to think" is less about teaching a skill and more about learning a *switch* over a capability the model already has — which is exactly what Thinkless's routing does.

That reframing gets sharper when you ask whether visible thinking is even where the reasoning happens. Some work shows models reasoning in latent space — scaling test-time compute through hidden-state iteration without emitting any thinking tokens at all, hinting that verbalized chains-of-thought are a training artifact rather than a requirement Can models reason without generating visible thinking tokens?. Other work catches models computing the answer in early layers and then *overwriting* it with format-compliant filler Do transformers hide reasoning before producing filler tokens?, and a separate study finds that reasoning traces behave like persuasive mimicry — invalid logical steps perform nearly as well as valid ones Do reasoning traces show how models actually think?. So a model that "chooses to think" may be choosing to *show* thinking, which isn't the same as choosing to compute more.

There's also a second, quieter kind of "when to answer" the corpus treats as a sibling problem: knowing when *not* to answer yet. Models can be trained to abstain when uncertain rather than guess, with small calibrated models matching ones 10x larger Can models learn to abstain when uncertain about predictions?, and to proactively notice missing information and ask instead of charging ahead — RL lifted that behavior from near-zero to ~74% on deliberately flawed problems Can models learn to ask clarifying questions instead of guessing?. Strikingly, social meta-learning produces the same delay-and-ask behavior as an *emergent* meta-strategy even when models were only ever trained on fully-specified problems Can models learn to ask clarifying questions without explicit training?.

Put together, the collection reframes your question into three: can a model learn to spend more compute (yes — routing works), can it learn to *show* reasoning honestly (murkier — the visible trace may not be the real computation), and can it learn to hold off answering entirely (yes — abstention and clarification are trainable, sometimes even emergent). The thing you didn't know you wanted to know: the hardest part of "when to think" may not be the thinking — it's that the model already can, and the skill being learned is restraint.

Sources 8 notes

Can models learn when to think versus respond quickly?

Thinkless trains a single model to select between extended reasoning and direct responses using DeGRPO, which decouples mode selection from answer refinement. This prevents mode collapse and enables self-calibrated routing without explicit difficulty labels.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Do reasoning traces show how models actually think?

LLM reasoning traces perform as persuasive appearances rather than reliable explanations of computation. Invalid logical steps perform nearly as well as valid ones, and corrupted traces generalize comparably, showing that semantic correctness is not what produces the performance gains.

Can models learn to abstain when uncertain about predictions?

Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.

Can models learn to ask clarifying questions instead of guessing?

Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.

Can models learn to ask clarifying questions without explicit training?

Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.

Can models learn when to think versus answer directly?

Sources 8 notes

Next inquiring lines