Why do language models ignore information in their context?
Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
The REMEDI paper names a specific failure mode: "failure of context integration." The example: an LM is prompted with a context establishing that Anita works in a law office, but when generating a continuation, the LM describes Anita as a nurse — overriding the contextual information with a prior association (names like Anita may statistically co-occur with certain occupations in training data).
This is a named, empirically documented failure mode, not a hypothetical. The failure occurs because the LM's parametric knowledge (compressed into weights from training) and its in-context information (the prompt) are not cleanly integrated. When they conflict, the parametric association can win.
The implication is important for how we think about context windows and RAG-style augmentation. Just providing information in context does not guarantee that a model will use it. If the information conflicts with strong prior associations, the prior may dominate — not because the model misread the context, but because context integration is not a lossless operation. The provided information gets processed through the same mechanisms that already have strong priors.
Fixing this requires causal intervention, not just better prompting: you need to modify the representations that carry the prior association, not just add more context on top of them. This is what REMEDI demonstrates — that adding a learned vector directly to entity representations can override the prior in a way that textual prompting cannot.
Source: Discourses
Related concepts in this collection
-
Do language models actually use their encoded knowledge?
Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
the complementary failure: even information that IS correctly encoded may not causally influence output
-
Do classical knowledge definitions apply to AI systems?
Classical definitions of knowledge assume truth-correspondence and a human knower. Do these assumptions hold for LLMs and distributed neural knowledge systems, or do they need fundamental revision?
context integration failure is part of why "LLM knowledge" is not propositional knowledge
-
Do language models actually build shared understanding in conversation?
When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
the conversational consequence: context integration failure at the representational level surfaces as presumption of common ground at the communicative level — both reflect the same absence of bidirectional grounding
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
llm context integration fails when prior training associations override current context information