Can we measure how much prior errors bias subsequent token predictions?

This explores whether we can quantify error propagation in autoregressive generation — the degree to which a model's own earlier tokens (especially mistaken ones) drag its later predictions off course.

This explores whether we can quantify error propagation in autoregressive generation — the degree to which a model's own earlier tokens drag its later predictions off course. The corpus's sharpest answer comes from work decomposing where reasoning errors actually originate. One framework separates the influence of preceding tokens into local, mid-range, and long-range sources and finds that *local* memorization — prediction biased by the tokens immediately before — accounts for up to 67% of chain-of-thought reasoning errors, and that share grows as problems get harder and the model drifts away from its training distribution Where do memorization errors arise in chain-of-thought reasoning?. That's close to a direct measurement of your question: it puts a number on how much the recent prior context (errors included) is steering the next token rather than the actual problem.

A second angle reframes the same phenomenon as a tug-of-war between context and what the model already 'knows.' When a model's pretraining associations are strong, they override the information sitting in its context window, so its outputs become inconsistent with what it was just told — and the work shows that text prompting alone can't fix this; you have to intervene causally in the internal representations to measure and shift the balance Why do language models ignore information in their context?. Read alongside the memorization framework, this suggests the bias you're trying to measure isn't one thing: part is the pull of recent tokens, part is the pull of baked-in priors, and they can be teased apart.

Here's the twist that complicates a naive measurement: prior tokens being 'wrong' doesn't reliably mean later predictions get worse. Models trained on deliberately corrupted or irrelevant reasoning traces hold their accuracy and sometimes generalize *better* out of distribution — implying the trace often acts as computational scaffolding rather than content the model is reasoning over Do reasoning traces need to be semantically correct?. So the influence of a prior 'error' depends entirely on which kind of token it is. Other work shows tokens aren't equal: only ~20% are high-entropy 'forking points' where the model genuinely decides direction Do high-entropy tokens drive reasoning model improvements?, and models internally rank tokens by functional importance, preserving symbolic-computation steps while treating grammar and filler as disposable Which tokens in reasoning chains actually matter most?. An error at a forking token should bias the future far more than an error in scaffolding.

The practical instrument the corpus offers is the model's own uncertainty. Calibrated token-probability estimates turn out to be a reliable read on when a model is on shaky ground — good enough to outperform more elaborate adaptive-retrieval schemes at a fraction of the cost Can simple uncertainty estimates beat complex adaptive retrieval?. That hints at a usable proxy: track where probability mass collapses or spikes after a likely-wrong token, and you have a cheap signal for cascading bias. Put together, the corpus says yes — you can measure prior-error bias, but the meaningful measurement is per-token and source-aware (local vs. parametric, forking vs. scaffolding), not a single global drift number.

Sources 6 notes

Where do memorization errors arise in chain-of-thought reasoning?

STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Do reasoning traces need to be semantically correct?

Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.

Do high-entropy tokens drive reasoning model improvements?

Only ~20% of tokens exhibit high entropy as pivotal reasoning decision points; RLVR primarily adjusts these forking tokens. Training exclusively on them matches or exceeds full-gradient performance, revealing that the minority carries the learning signal.

Which tokens in reasoning chains actually matter most?

Greedy likelihood-preserving pruning reveals six functional token categories; symbolic computation tokens are preferentially preserved while grammar and meta-discourse are pruned first. Student models trained on these pruned chains outperform those trained on frontier-model compression.

Can simple uncertainty estimates beat complex adaptive retrieval?

Calibrated token-probability uncertainty consistently beats multi-call adaptive retrieval on single-hop tasks and matches performance on multi-hop, using a fraction of the LM and retriever calls. The model's self-knowledge proves more reliable than external heuristics for deciding when to retrieve.

Can we measure how much prior errors bias subsequent token predictions?

Sources 6 notes

Next inquiring lines