LLM Reasoning and Architecture Reinforcement Learning for LLMs Language Understanding and Pragmatics

Do reflection tokens carry more information about correct answers?

Explores whether tokens expressing reflection and transitions concentrate information about reasoning outcomes disproportionately compared to other tokens, and what role they play in reasoning performance.

Note · 2026-02-23 · sourced from MechInterp

By tracking mutual information (MI) between intermediate representations and the correct answer at each step of LRM reasoning, an interesting phenomenon emerges: MI spikes suddenly at specific steps, creating sparse, non-uniform "MI peaks" throughout the reasoning process.

These peaks overwhelmingly correspond to tokens expressing reflection, self-correction, or transitions — "Wait," "Hmm," "Therefore," "So" — which the authors term "thinking tokens." Three key findings:

Thinking tokens are functionally necessary. Fully suppressing them significantly harms reasoning performance. Randomly suppressing the same number of tokens has minimal impact. The information is concentrated in the thinking tokens, not distributed across the trace.
MI peaks are a training artifact. Base models (e.g., LLaMA-3.1-8B) do not exhibit the MI peaks phenomenon clearly. The distinct pattern emerges from reasoning-intensive training (RL post-training). This suggests reasoning training teaches models to concentrate information at specific reflection points.
Two practical improvements follow. Representation Recycling (allowing MI-peak representations to iterate through the model multiple times) improves accuracy by 20% on AIME24. Thinking Token Test-time Scaling (forcing continued reasoning from thinking tokens when budget remains) yields steady performance improvements.

This provides an information-theoretic complement to the sentence-level thought anchors finding. Which sentences actually steer a reasoning trace? identifies planning and backtracking sentences via counterfactual, attention, and causal suppression methods. MI peaks identify the same pivotal role via information theory — converging from a different analytical direction.

The convergence across methods (counterfactual importance, attention patterns, causal suppression, and now mutual information) and across granularity levels (token-level MI peaks, sentence-level thought anchors, RLVR's high-entropy forking tokens) strongly supports the claim that reasoning traces have a sparse-pivot structure. Most tokens are filler; a small subset carries the reasoning signal.

Source: MechInterp

Related concepts in this collection

Which sentences actually steer a reasoning trace? Can we identify which sentences in a reasoning trace have outsized influence on the final answer? Three independent methods converge on a surprising answer about planning and backtracking.
sentence-level complement; MI peaks add information-theoretic evidence for the same sparse-pivot structure
Do only 20 percent of tokens actually matter for reasoning? Chain-of-thought reasoning might depend on a small minority of high-entropy tokens that act as decision points. If true, could training focus only on these critical tokens match or exceed full-gradient updates?
token-level RLVR analog: high-entropy tokens during training correspond to MI-peak tokens during inference
Does more thinking time always improve reasoning accuracy? Explores whether extending a model's thinking tokens linearly improves performance, or if there's a point beyond which additional reasoning becomes counterproductive.
MI peaks explain what matters within the token budget: it's the density of thinking tokens, not total length
Does RL teach reasoning or just when to use it? Does reinforcement learning in thinking models actually create new reasoning abilities, or does it simply teach existing capabilities when to activate? This matters for understanding where reasoning truly emerges.
MI peaks as a mechanistic signature: RL training creates the MI-peak pattern that base models lack
Do reasoning cycles in hidden states reveal aha moments? What if the internal loops in model reasoning—visible in hidden-state topology—correspond to the reconsidering moments that happen during reasoning? This note explores whether graph cyclicity captures a mechanistic signature of insight.
hidden-state topology confirms the same sparse-pivot structure
Can we measure how deeply a model actually reasons? What if reasoning quality isn't about length or confidence, but about how much a model's predictions shift across its internal layers? Can tracking these shifts reveal genuine thinking versus pattern-matching?
complementary token-level measurement: MI peaks identify WHICH tokens matter via information theory; DTR identifies HOW DEEPLY the model computes at each token via layer-wise prediction stabilization; orthogonal methods converging on the same sparse-pivot structure at the representation-graph level: cyclicity corresponds to backtracking tokens (MI peaks at self-correction), diameter tracks exploration breadth; both analyses converge on reasoning having a concentrated structure rather than uniform information distribution

Concept map

15 direct connections · 153 in 2-hop network ·dense cluster

Do reflection tokens carry more information abou… Which sentences actually steer a reasoning trace? Do only 20 percent of tokens actually matter for r… Does more thinking time always improve reasoning a… Does RL teach reasoning or just when to use it? Do reasoning cycles in hidden states reveal aha mo… Can we measure how deeply a model actually reasons…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

thinking tokens are mutual information peaks — sparse reflection and transition tokens carry disproportionate information about correct answers