LLM Reasoning and Architecture Language Understanding and Pragmatics Reinforcement Learning for LLMs

Which sentences actually steer a reasoning trace?

Can we identify which sentences in a reasoning trace have outsized influence on the final answer? Three independent methods converge on a surprising answer about planning and backtracking.

Note · 2026-02-22 · sourced from Reasoning Methods CoT ToT
How should we allocate compute budget at inference time?

Mechanistic interpretability of reasoning traces typically focuses on token-level activations. The "Thought Anchors" paper takes a sentence-level approach, arguing that sentences are a more coherent unit for understanding reasoning than tokens but more granular than paragraphs.

Three complementary methods are applied to the same reasoning traces:

  1. Counterfactual resampling (black-box): For each sentence, resample 100 completions conditioned on that sentence being present vs. replaced with a different-meaning sentence. Sentences that significantly shift the final answer distribution have high counterfactual importance.

  2. Attention pattern analysis (white-box): Identify "receiver heads" — attention heads that narrow focus toward specific past sentences. Sentences that are heavily broadcast by receiver heads are mechanistically central to downstream computation.

  3. Causal suppression (white-box): Mask attention toward each sentence from subsequent tokens. Measure KL divergence effect on subsequent token distributions. Sentences whose suppression has large downstream effects are causally active.

All three methods converge on the same subset of sentences: planning sentences (establishing the direction of reasoning) and backtracking sentences ("Wait...", "Actually...", error-correction steps). These are the thought anchors — sentences that disproportionately guide what comes after.

The finding that backtracking sentences are thought anchors extends Why do correct reasoning traces contain fewer tokens? and Do hedging markers actually signal careful thinking in AI?. Backtracking is not mere noise — it is a functional pivot. A backtracking sentence recognized as a thought anchor shifts the entire subsequent reasoning trajectory.

This also reveals why receiver heads in reasoning models are more narrowly focused than in base models: the reasoning-trained model has learned to weight certain past sentences more heavily as guides for subsequent generation. This attentional specialization is the mechanistic signature of structured reasoning.

Practical implication: if you want to evaluate whether a reasoning trace is doing real work, identify the thought anchors. If you want to steer reasoning, these are the leverage points. The anchors are not uniformly distributed — sparse critical sentences dominate.

Information-theoretic confirmation (MI Peaks): The "Demystifying Reasoning Dynamics with Mutual Information" paper provides a fourth convergent method. By tracking mutual information (MI) between intermediate representations and the correct answer across reasoning steps, they find MI peaks — positions where information about the correct answer suddenly spikes. These peaks are sparse and non-uniformly distributed. Crucially, MI peaks correspond to the same class of tokens identified as thought anchors: reflection tokens ("Wait," "Hmm"), transition tokens ("Therefore," "So"), and self-correction tokens. Suppressing these thinking tokens significantly degrades reasoning performance, while suppressing the same number of random tokens has minimal impact. The paper also proposes Representation Recycling (RR) — allowing representations at MI peaks to undergo multiple iterations through the model — which improves accuracy up to 20% on hard benchmarks. This is the first technique that directly exploits thought anchor identification for performance improvement. See Do reflection tokens carry more information about correct answers?.

Token-level memorization sources (STIM, 2508.02037): The STIM framework adds a fourth convergent method at the token level — identifying three distinct sources of memorization that cause reasoning errors: (1) local memorization from frequent continuations of immediately preceding tokens (dominant error source, up to 67% of wrong tokens), (2) mid-range memorization from co-occurrence with generation prefix, and (3) long-range memorization from co-occurrence with prompt tokens. Under distributional shift toward rare inputs, all three sources intensify. High STIM memorization scores predict erroneous tokens with high Precision@k and Recall@k. This adds a complementary mechanism to the thought anchor framework: while thought anchors identify which sentences are structurally important (planning/backtracking), STIM identifies which tokens within those sentences are driven by memorization rather than reasoning. A thought anchor sentence could contain tokens that are mechanistically pivotal AND memorization-driven — explaining why structurally important reasoning steps can nevertheless produce errors. See Where do memorization errors arise in chain-of-thought reasoning?.

Token-level mechanistic refinement: The "Beyond 80/20" RLVR analysis provides a finer-grained version of the same insight at the token level. High-entropy minority tokens — the ~20% of tokens where the model's probability distribution is most uncertain — are the critical forking points where RLVR's gradient signal is concentrated. Restricting gradient updates to only these tokens matches or exceeds full updates. These high-entropy tokens are the token-level analog of sentence-level thought anchors: both identify sparse critical junctures where reasoning trajectory can diverge. The convergence across levels of analysis (tokens, sentences) reinforces that reasoning traces have a sparse-pivot structure at multiple granularities. See Do only 20 percent of tokens actually matter for reasoning?.


Source: Reasoning Methods CoT ToT, RLVR, Memory

Related concepts in this collection

Concept map
26 direct connections · 168 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

thought anchors are planning and backtracking sentences with disproportionate causal influence on reasoning traces