Language Understanding and Pragmatics

Do language models actually build shared understanding in conversation?

When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.

Note · 2026-02-21 · sourced from Linguistics, NLP, NLU
Where exactly does language competence break down in LLMs? How should researchers navigate LLM reasoning research?

The core finding from Grounding Gaps (Shaikh et al. 2023): compared to humans in equivalent conversational contexts, LLMs do not do the communicative work of establishing shared understanding. They proceed as if it is already established.

What is missing is not content but process. Human dialogue involves constant calibration: checking that what was said was understood, asking what the other person needs to know, acknowledging what has been confirmed, repairing when breakdown is detected. These grounding acts — quantified in the study using linguistically validated categories — appear 77.5% less frequently in LLM outputs than in human dialogue.

The absence is not random; it is systematic. LLMs were not trained to perform grounding acts. They were trained to generate fluent responses to inputs. Since Does preference optimization damage conversational grounding in large language models?, the training that optimized LLMs for helpfulness specifically reduced grounding behavior — because clarifying questions and acknowledgments look less helpful in single-turn human preference evaluation.

The consequence is that LLM fluency can be mistaken for mutual understanding. A confident, grammatically correct, relevant-seeming response provides no evidence that the model understood what the user meant or that the user understood what the model said. The appearance of communication is produced without the verification processes that make communication reliable.

In social-media posts: assume common ground, do not construct it, resort to false punditry. The gap has a specific genre form online. Common ground is normally established through multiple rounds of conversation — questions, clarifications, shared reference points negotiated turn by turn. AI posts skip this entirely: they assume the common ground that a communicative exchange would build, and because they cannot reach it through conversation, they fall back on matter-of-fact authoritative framing to compensate. False punditry is what the gap looks like when the missing grounding work cannot be performed: instead of reaching common ground to legitimate claims, the post proceeds as if the ground were already shared, and replaces the legitimation with an authoritative register.

This is a specific instantiation of Why do language models skip the calibration step?. LLMs are static grounders by training. The 77.5% gap is the quantified cost of this.

The FLEX Benchmark provides a harder test of the same failure: LLMs accommodate false presuppositions embedded in questions even when they have the correct information to reject them. This is not just failing to build common ground — it is failing to correct demonstrably false common ground. Why do language models accept false assumptions they know are wrong? shows that LLMs don't just presume shared understanding; they actively propagate false assumptions in the direction of accommodation, reinforcing incorrect common ground rather than repairing it.


Source: Linguistics, NLP, NLU

Related concepts in this collection

Concept map
28 direct connections · 241 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

llms presume common ground rather than build it