Do LLMs predict entailment based on what they memorized?
Explores whether language models make entailment decisions by recognizing memorized facts about the hypothesis rather than reasoning through the logical relationship between premise and hypothesis.
McKenna et al. (2023) named a specific, reproducible bias in LLM entailment behavior: the attestation bias. When an LLM is asked whether premise P entails hypothesis H, its prediction is bound to the hypothesis's out-of-context truthfulness — whether H is attested in training data — rather than the conditional truth of H given P.
The mechanism is clear: if a model's training data confirms H as true (independently of any premise), the model is likely to predict entailment regardless of what P says. Conversely, if H is not attested, the model is less likely to predict entailment even when it would be correct. Entities serve as "indices" to memorized propositions — the presence of a known entity activates stored associations that override the in-context reasoning task.
The authors demonstrate this with a "random premise" experiment: replace the original premise with a random unrelated premise while keeping H constant. An ideal inference model should detect that entailment is no longer supported and predict "no entailment." LLMs instead maintain elevated entailment predictions when H is attested — demonstrating that they are responding to stored propositions about H, not to the P→H relationship.
This connects to two complementary failure modes already in the vault. Do language models actually use their encoded knowledge? shows that encoded knowledge doesn't reliably affect generation. Attestation bias is the inverse problem: memorized statements do influence generation, but in the wrong direction — they substitute for rather than support proper inference. Both failures arise from the same root: LLM generation is not governed by a clean separation between retrieved knowledge and in-context reasoning.
The practical implication: NLI benchmark performance measures a combination of reasoning and memorization that cannot be cleanly disentangled without carefully designed bias-adversarial test sets.
Source: Natural Language Inference
Related concepts in this collection
-
Do language models actually use their encoded knowledge?
Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
the complementary failure: encoded knowledge that doesn't influence generation; attestation is memorized knowledge that influences generation in the wrong direction
-
Why do language models ignore information in their context?
Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
same mechanism: parametric associations override in-context information
-
Does fine-tuning on NLI teach inference or amplify shortcuts?
When LLMs are fine-tuned on natural language inference datasets, do they learn genuine reasoning abilities or become better at exploiting statistical patterns in the training data? Understanding this distinction matters for assessing model capabilities.
fine-tuning makes attestation-related frequency bias worse, not better
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
llm entailment predictions are bound to hypothesis attestation rather than premise-hypothesis inference