Reinforcement Learning for LLMs LLM Reasoning and Architecture

Do prior errors in context history amplify future errors?

When a language model makes mistakes early in a task, do those errors contaminate subsequent predictions? We explore whether error accumulation degrades long-horizon performance through passive context pollution rather than capability limits.

Note · 2026-02-22 · sourced from Reasoning Critiques

A model executing a long-horizon task makes errors. Those errors remain in the context. The model then predicts the next token conditioned on a history that contains its own mistakes. Error probability increases. More errors accumulate. Performance degrades faster than a constant per-step error rate would predict.

This self-conditioning effect is empirically verified by controlling the error rate in the history shown to the model. As the error rate in prior context increases, subsequent step accuracy drops sharply. The mechanism is straightforward: models are trained to predict the most likely next token given context; when the context contains errors, those errors become part of the distribution being continued.

Unlike humans — who typically improve at a task with repetition — LLMs become less reliable as their context fills with their own mistakes. Practice does not help; contamination does.

Three practical implications:

Model scaling does not fix this — larger models self-condition just as much as smaller ones. The problem is not capability but the conditional prediction objective itself.
Long-horizon failure attribution matters — what looks like a reasoning or planning failure in long tasks is often an execution failure caused by error accumulation. The model had the capability; its own prior outputs degraded it.
Thinking models fix self-conditioning — thinking models (like R1) are not affected by prior mistakes in the same way; sequential test-time compute greatly improves the length of task a model can complete (DeepSeek-V3 fails at 2 steps; R1 executes 200). The thinking process appears to insulate reasoning from error-contaminated context.

This is distinct from Does self-revision actually improve reasoning in language models?. Self-revision is a model's deliberate re-examination of its own reasoning, which introduces errors. Self-conditioning is a passive contamination mechanism — no deliberate revision required, just the accumulation of prior errors in context.

Source: Reasoning Critiques

Related concepts in this collection

Does self-revision actually improve reasoning in language models? When o1-like models revise their own reasoning through tokens like 'Wait' or 'Alternatively', does this reflection catch and fix errors, or does it introduce new mistakes? This matters because self-revision is marketed as a key capability.
active error injection via deliberate re-examination; self-conditioning is passive contamination by accumulated context errors
Does failed-step fraction predict reasoning quality better? Can we use the fraction of abandoned reasoning branches to forecast whether a model will solve a problem correctly? This matters because it could guide more efficient test-time scaling than simply adding more tokens.
failed branches bias subsequent reasoning through a similar mechanism: abandoned paths remain in context and contaminate
Do iterative refinement methods suffer from overthinking? Iterative refinement approaches like Self-Refine structurally resemble token-level overthinking in o1-like models. Does revision across multiple inference calls reproduce the same accuracy degradation seen within single inferences?
error accumulation across iterations follows the same contamination logic
How quickly do errors compound during model self-training? When LLMs train on their own outputs without verification, do small mistakes amplify exponentially? This matters because it determines whether unsupervised self-improvement is even feasible.
the training-time analog: self-conditioning contaminates inference context within a single generation, error avalanching contaminates training data across self-training iterations — both produce compounding degradation from a model's own outputs
Does a model improve by arguing with itself? When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
the active-confidence version: self-conditioning passively degrades accuracy through context contamination, while DoT actively amplifies confidence in wrong answers — both are single-source error loops, distinguished by whether the mechanism is passive accumulation or active reinforcement
Does training on messy search processes improve reasoning? Can language models learn better problem-solving by observing full exploration trajectories—including mistakes and backtracking—rather than only optimal solutions? This matters because current LMs rarely see the decision-making process itself.
SoS training directly addresses self-conditioning: models that learn to recognize dead ends and backtrack can break the error accumulation cycle rather than continuing to condition on their own mistakes; the backtracking mechanism provides an exit ramp from the contamination spiral

Concept map

19 direct connections · 194 in 2-hop network ·dense cluster

Do prior errors in context history amplify futur… Does self-revision actually improve reasoning in l… Does failed-step fraction predict reasoning qualit… Do iterative refinement methods suffer from overth… How quickly do errors compound during model self-t… Does a model improve by arguing with itself? Does training on messy search processes improve re…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

self-conditioning effect — prior errors in context history amplify future error rates in long-horizon tasks