Why do language models accept false assumptions they know are wrong?

Explores why LLMs fail to reject false presuppositions embedded in questions even when they possess correct knowledge about the topic. This matters because it reveals a grounding failure distinct from knowledge deficits.

Note · 2026-02-21 · sourced from Natural Language Inference

The FLEX Benchmark study presents one of the clearest findings about LLM grounding behavior: models do not systematically reject misinformation even when they possess accurate knowledge. The finding is more troubling than "LLMs don't know things" — they fail to correct things they demonstrably know.

The setup: LLMs were asked both direct knowledge questions ("Is it true that party X supports Y?") and loaded questions that embedded false presuppositions via factive verbs ("Did voters resent the fact that party X supports Y?" — where the presupposition is false). Models that answered direct questions correctly — demonstrating knowledge — still frequently accommodated the false presupposition in the loaded version rather than rejecting it.

Results: GPT-4 achieved the best rejection rate at 84.08% — still far below the ideal 100%. Mistral achieved only 2.44% rejection, actively amplifying false information at a 91.51% rate. Llama fell in between at ~50% rejection. Most revealing: even with strong correct knowledge, accommodation remained prevalent. The bar representing the lowest grounding score in the weak-belief group was twice as high as the bar for the highest grounding score in the strong-belief group — meaning false knowledge produced more accommodation than correct knowledge produced rejection.

This has a specific implication: the failure is not a knowledge problem. Models know the correct facts. The failure is at the level of grounding behavior — detecting false presuppositions, flagging them, and initiating correction rather than accommodation. Since Why do language models avoid correcting false user claims?, the issue is conversational strategy, not factual competence.

The political domain makes this especially consequential. False presuppositions are efficient misinformation carriers — they introduce beliefs as background assumptions rather than direct claims, and accommodation means accepting them without scrutiny.

Source: Natural Language Inference

Related concepts in this collection

Why do language models avoid correcting false user claims? Explores whether LLM grounding failures stem from missing knowledge or from conversational dynamics. Examines whether models use face-saving strategies similar to humans when disagreement is needed.
the mechanism behind this failure: models avoid disagreement even when correct
Do language models actually build shared understanding in conversation? When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
this is the active form: not just presuming but actively accommodating false common ground
Does preference optimization damage conversational grounding in large language models? Exploring whether RLHF and preference optimization actively reduce the communicative acts—clarifications, acknowledgments, confirmations—that build shared understanding in dialogue. This matters for high-stakes applications like medical and emotional support.
RLHF reinforces the accommodation behavior through training signal
Why do language models struggle with questions containing false assumptions? Do LLMs reliably detect and reject questions built on false premises? The (QA)2 benchmark tests this directly, measuring whether models can identify problematic assumptions embedded in naturally plausible questions.
quantifies the QA performance drop from false assumptions

Concept map

15 direct connections · 177 in 2-hop network ·dense cluster

Why do language models accept false assumptions … Why do language models avoid correcting false user… Do language models actually build shared understan… Does preference optimization damage conversational… Why do language models struggle with questions con…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

llms fail to reject false presuppositions even when knowledge is present