How do verbose and concise reasoning occupy different regions in activation space?

This explores what it means for an LLM to 'spend more words' versus 'fewer words' on reasoning — and the finding that verbosity isn't scattered noise but lives along a measurable direction inside the model's internal activations, which means it can be dialed up or down.

This explores what's actually happening inside the model when reasoning runs long versus short — and the surprising answer is that verbosity is geometric. It occupies a distinct, linear region of the model's activation space rather than being an emergent property of the problem. Can we steer reasoning toward brevity without retraining? shows you can extract a single steering vector from just 50 paired examples and slide a model's chain-of-thought along it — cutting length by 67% while holding accuracy, no retraining required. That a one-dimensional 'verbosity knob' exists at all is the key clue: it means long and short reasoning aren't different kinds of thinking, they're the same computation expressed at different lengths.

The corroborating evidence is that most of those extra words weren't doing computational work in the first place. Can minimal reasoning chains match full explanations? finds that minimal reasoning chains match full explanations at 7.6% of the token cost — the other 92% served style and documentation, not reasoning. So if verbose and concise modes land in separate regions of activation space, it's largely because the verbose region is padded with tokens that perform readability rather than inference. This reframes 'concise reasoning' not as a compressed version of verbose reasoning but as the actual computational core with the packaging stripped away.

What makes the geometry more than a curiosity is that length has an optimum, and models drift toward it on their own. Why does chain of thought accuracy eventually decline with length? shows accuracy peaks at an intermediate length, with stronger models preferring shorter chains — and RL training naturally gravitates there without ever being told to. Does more thinking time always improve reasoning accuracy? sharpens the warning: pushing from ~1,100 to ~16K thinking tokens dropped accuracy from 87% to 70%. Verbosity past the sweet spot isn't neutral; it actively degrades. So the 'verbose region' of activation space is partly a region of overthinking, and steering toward concision is steering back toward the productive zone.

The most radical adjacent framing is that visible reasoning tokens may be a presentation layer, not the reasoning itself. Can models reason without generating visible thinking tokens? demonstrates models scaling test-time compute through hidden-state iteration with no verbalized steps at all — suggesting verbalization is a training artifact. Do transformers hide reasoning before producing filler tokens? goes further: models can compute the correct answer in layers 1–3, then actively suppress it to emit format-compliant filler. If the real reasoning happens in latent space and the words are downstream decoration, then 'verbose vs concise regions in activation space' is measuring how much a model chooses to externalize — not how hard it's thinking.

Two cross-domain threads round this out. Can models learn when to think versus respond quickly? turns the geometry into a control problem: a model that routes between extended thinking and direct answers, calibrating verbosity per-question. And Do language models sparsify their activations under difficult tasks? hints that activation geometry shifts adaptively with difficulty — hidden states sparsify on hard, unfamiliar tasks. Read together, the picture is that verbosity is a navigable axis in the model's internal space: separable, steerable, often padded, and frequently not where the thinking actually lives.

Sources 8 notes

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Can minimal reasoning chains match full explanations?

Chain of Draft achieves equivalent accuracy to standard chain-of-thought on arithmetic, symbolic, and commonsense tasks while using only 7.6% of tokens. The 92.4% of removed tokens served style and documentation, not computation.

Why does chain of thought accuracy eventually decline with length?

Task accuracy peaks at intermediate CoT length, with optimal length increasing alongside task difficulty but decreasing with model capability. RL training naturally gravitates toward shorter chains as models improve, revealing that simplicity emerges from reward signals rather than explicit training.

Does more thinking time always improve reasoning accuracy?

Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.

Can models reason without generating visible thinking tokens?

Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.

Do transformers hide reasoning before producing filler tokens?

Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.

Can models learn when to think versus respond quickly?

Thinkless trains a single model to select between extended reasoning and direct responses using DeGRPO, which decouples mode selection from answer refinement. This prevents mode collapse and enables self-calibrated routing without explicit difficulty labels.

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

How do verbose and concise reasoning occupy different regions in activation space?

Sources 8 notes

Next inquiring lines