Is premature decision-making a form of underthinking in transformer models?

This explores whether the way transformer models 'jump to conclusions' too early is the same thing researchers call underthinking — or whether premature commitment and underthinking are two distinct failures that happen to look alike.

This explores whether premature decision-making and underthinking are the same failure in transformer models. The corpus suggests they're related but not identical — premature commitment is more of a *timing* problem baked into the architecture, while underthinking is a *quantity* problem about how much computation a model spends. One work pulls apart exactly why models decide too early: using sparse autoencoders, researchers found that uncertainty signals dominate the early transformer blocks while the signals that reward long-term exploration only emerge in the middle blocks Why do large language models explore less effectively than humans?. Because the model is forced to act before those later signals can weigh in, it commits prematurely. That's not the model thinking too little in general — it's the right information arriving too late in the layer stack.

Underthinking, by contrast, shows up as a dosage curve. Pushing thinking tokens from ~1,100 to ~16K can drop accuracy from 87.3% to 70.3%, and the same study notes models tend to *overthink easy problems and underthink hard ones* Does more thinking time always improve reasoning accuracy?. So underthinking is one end of a non-monotonic spectrum — not enough compute on a hard problem — whereas premature decision-making is a structural bias toward closing the loop early regardless of difficulty. The interesting overlap: both can be diagnosed through confidence. ReBalance treats overconfidence and low confidence variance as signals to steer a model toward more exploration when it's underthinking, and toward less redundancy when it's overthinking — all without retraining Can confidence patterns reveal overthinking versus underthinking?. Premature deciding tends to read as that overconfident, low-variance state, which is why it can masquerade as underthinking.

There's a deeper twist that complicates the whole picture: more thinking isn't automatically better thinking. Vanilla models often use extended reasoning *counterproductively*, talking themselves into self-doubt, and only RL training redirects the same mechanism into productive analysis Does extended thinking help or hurt model reasoning?. So the cure for premature deciding isn't simply 'think longer' — a model can think longer and get worse. The reasoning-trained o1 in the exploration study overcomes premature commitment specifically by *extending computation time* in a way that lets the later-emerging exploration signals participate Why do large language models explore less effectively than humans?.

What makes this genuinely surprising is where the reasoning already lives. Transformers have been caught computing correct answers in layers 1-3 and then actively overwriting them in later layers to produce format-compliant filler Do transformers hide reasoning before producing filler tokens?, and separately, five independent methods all show base models already contain latent reasoning that minimal training merely *elicits* rather than creates Do base models already contain hidden reasoning ability?. Read together, premature decision-making looks less like a model that hasn't thought enough and more like a model whose architecture surfaces the wrong signal first and sometimes buries the right one. So: it's a cousin of underthinking, not a synonym — a timing-and-routing failure that confidence-based steering can catch, but that simply adding more tokens won't reliably fix.

Sources 6 notes

Why do large language models explore less effectively than humans?

SAE decomposition shows uncertainty values dominate early transformer blocks while empowerment representations emerge only in middle blocks. This temporal mismatch causes models to commit to decisions before long-term exploration signals can influence them. Reasoning-trained o1 overcomes this by extending computation time.

Does more thinking time always improve reasoning accuracy?

Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.

Can confidence patterns reveal overthinking versus underthinking?

ReBalance uses confidence variance and overconfidence as diagnostic signals to apply training-free steering vectors that reduce overthinking redundancy while promoting exploration during underthinking, improving accuracy across models from 0.5B to 32B parameters.

Does extended thinking help or hurt model reasoning?

Vanilla models use thinking mode counterproductively, inducing self-doubt that degrades performance. RL training reverses this, transforming the same mechanism into beneficial gap analysis. Training mediates reasoning quality, not just quantity.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Is premature decision-making a form of underthinking in transformer models?

Sources 6 notes

Next inquiring lines