How do training associations override context information in language models?
This explores why language models sometimes ignore the information you put in front of them and instead fall back on what they absorbed during training — and what mechanism makes that override happen.
This explores why language models sometimes ignore the information you put in front of them and instead lean on what they learned during training. The corpus is fairly direct about the core finding: when a model's parametric knowledge — the associations baked into its weights — is strong enough, it will generate answers inconsistent with the context you've actually supplied, and simply rewording the prompt won't fix it. Changing the output reliably requires intervening in the model's internal representations, not just the text Why do language models ignore information in their context?.
What makes this more than a quirk is how predictable and how cheap the override is to establish. One line of work shows that whether a keyword gets 'primed' after training is forecastable from its probability *before* training, with a sharp threshold separating contexts where priming takes hold from those where it doesn't — and as few as three exposures are enough to lock the association in Can we predict keyword priming before learning happens?. So the priors that later steamroll your context aren't subtle; they're strong, early-formed statistical commitments.
The deeper reason context loses is that these models reason through semantic association rather than symbolic rule-following. When researchers strip the familiar meaning out of a task and leave only the logical rules in context, performance collapses — the model reaches for trained commonsense and token associations instead of applying the rules you handed it Do large language models reason symbolically or semantically?. A related failure shows up with presupposition triggers and non-factive verbs: models read them as surface cues and miss their actual effect on what follows, again defaulting to learned patterns over the structure in front of them Why do embedding contexts confuse LLM entailment predictions?. This also sets a hard ceiling — prompt optimization can reorganize and activate what's already in the training distribution, but it cannot inject knowledge the model never learned Can prompt optimization teach models knowledge they lack?.
Where it gets interesting is the work on *not* letting priors win. If textual prompting can't override strong associations, one route is to make the model's reliance on context a trained behavior rather than a hope: consistency training teaches a model to respond the same way whether or not a prompt is wrapped in distractions, using its own clean answers as targets Can models learn to ignore irrelevant prompt changes?. Another reframes adaptation itself as a routing problem — push fast, task-specific lessons into the context/prompt channel and keep slow weight updates minimal, which sidesteps catastrophic forgetting Can splitting adaptation into two channels reduce forgetting?. The implication worth carrying away: 'context vs. training' isn't a fixed tug-of-war the model loses — it's an allocation choice you can engineer, but only if you treat the strong prior as a representation-level fact rather than something a cleverer sentence can talk it out of How do domain training techniques actually reshape model behavior?.
Sources 8 notes
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.
Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
LLMs treat presupposition triggers and non-factive verbs as surface cues rather than computing their opposite semantic effects on entailments. This structural failure persists across prompts and models, suggesting models rely on surface patterns instead of structural analysis.
Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.
Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.
Fast-Slow Training routes task-specific lessons into optimized prompts while keeping parameter updates minimal, reaching equivalent performance 1.4–3x faster with substantially less catastrophic forgetting and plasticity loss, demonstrating that forgetting is a misallocation problem rather than an inherent cost.
Research shows every adaptation method—from parameter-efficient tuning to knowledge graph curricula—has optimal conditions tied to specific domains. The key finding: visible benefits like performance gains often come with hidden degradation in reasoning faithfulness, capability transfer, and format flexibility.