Language Understanding and Pragmatics LLM Reasoning and Architecture

Do LLMs predict entailment based on what they memorized?

Explores whether language models make entailment decisions by recognizing memorized facts about the hypothesis rather than reasoning through the logical relationship between premise and hypothesis.

Note · 2026-02-21 · sourced from Natural Language Inference
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

McKenna et al. (2023) named a specific, reproducible bias in LLM entailment behavior: the attestation bias. When an LLM is asked whether premise P entails hypothesis H, its prediction is bound to the hypothesis's out-of-context truthfulness — whether H is attested in training data — rather than the conditional truth of H given P.

The mechanism is clear: if a model's training data confirms H as true (independently of any premise), the model is likely to predict entailment regardless of what P says. Conversely, if H is not attested, the model is less likely to predict entailment even when it would be correct. Entities serve as "indices" to memorized propositions — the presence of a known entity activates stored associations that override the in-context reasoning task.

The authors demonstrate this with a "random premise" experiment: replace the original premise with a random unrelated premise while keeping H constant. An ideal inference model should detect that entailment is no longer supported and predict "no entailment." LLMs instead maintain elevated entailment predictions when H is attested — demonstrating that they are responding to stored propositions about H, not to the P→H relationship.

This connects to two complementary failure modes already in the vault. Do language models actually use their encoded knowledge? shows that encoded knowledge doesn't reliably affect generation. Attestation bias is the inverse problem: memorized statements do influence generation, but in the wrong direction — they substitute for rather than support proper inference. Both failures arise from the same root: LLM generation is not governed by a clean separation between retrieved knowledge and in-context reasoning.

The practical implication: NLI benchmark performance measures a combination of reasoning and memorization that cannot be cleanly disentangled without carefully designed bias-adversarial test sets.


Source: Natural Language Inference

Related concepts in this collection

Concept map
15 direct connections · 167 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

llm entailment predictions are bound to hypothesis attestation rather than premise-hypothesis inference