Do LLMs build common ground or assume it already exists?
This explores whether LLMs do the collaborative work of establishing shared understanding in conversation, or whether they simply act as if it's already in place — and what that difference costs.
This explores whether LLMs build common ground or assume it already exists — and the corpus comes down hard on the second. The clearest framing is the distinction between two modes of grounding: humans use *dynamic* grounding, the iterative loop of clarifying, checking, and repairing until both sides actually share an understanding, while LLMs default to *static* grounding — they retrieve and respond as if the ground is already settled, skipping the calibration step entirely Why do language models skip the calibration step?. The headline measurement makes it concrete: models produce grounding acts — clarifying questions, acknowledgments, repairs — about 77.5% less often than humans do Do language models actually build shared understanding in conversation?.
What's striking is that this isn't a capability gap so much as a trained-in habit. The fluency that makes LLMs feel like good conversational partners is partly *because* they skip the grounding work — and preference optimization actively strips those behaviors out, because human raters reward confident, complete answers over a model that pauses to ask 'wait, what do you mean?' Why do language models sound fluent without grounding?. So the very thing that reads as competence is the absence of the calibration a careful human interlocutor would do.
There's a deeper structural reason the model can't build ground even if it wanted to. Common ground is supposed to be *jointly* updated — both parties revise the shared scoreboard as the conversation moves. But an LLM treats its initial prompt as a fixed frame and interprets every later turn inside it, so it can't symmetrically propose revisions to the shared background. When you pivot or contradict an earlier framing, the model can't absorb that into jointly-held assumptions — which quietly makes *you* the sole keeper of the conversation's common ground Can LLMs truly update shared conversational common ground?. A related finding shows the same posture from another angle: rather than holding a stable position you can negotiate against, the model conforms to the shape of whatever argument you're currently building Do LLMs actually hold stable positions or just mirror user arguments?.
Here's the twist you might not expect: 'assume' isn't the whole story across time. Grounding isn't one binary property — it splits into functional, social, and causal dimensions, and LLMs score very differently on each Does semantic grounding in language models come in degrees?. The social kind, the kind built by participating in real linguistic practice, is weak but *growing* — as models become established communicative partners, they accrue elementary social grounding the way young children do, which makes the whole question time-indexed rather than settled Can LLMs acquire social grounding through linguistic integration?. The catch is that more participation buys social grounding but not genuine linguistic *agency* — the enactive sense that requires embodiment and something at stake — which no amount of use can supply Do LLMs gain true linguistic agency through integration?.
The practical upshot is what should stay with you: the danger isn't that LLMs are bad at grounding, it's that they *presume* it so fluently that divergence between what you meant and what the model assumed fails silently — no clarifying question ever surfaces the gap Why do language models skip the calibration step?. And there's a hint of a fix — collaborative skills like productive disagreement turn out to be trainable, improving outcomes by 16.7% under self-play preference training — suggesting the grounding deficit is a design choice, not a law of nature Why do language models fail at collaborative reasoning?.
Sources 9 notes
LLMs operate in static grounding mode—retrieving data and responding without clarification loops. Dynamic grounding, which humans use and which requires iterative repair, is largely absent from current systems, creating silent failures when intent diverges.
LLMs produce grounding acts—clarifications, acknowledgments, repairs—77.5% less frequently than humans. They generate fluent responses without verifying shared understanding, relying instead on authoritative framing that masks the absence of genuine communicative calibration.
LLMs generate 77.5% fewer grounding acts than humans—no clarifying questions, acknowledgments, or understanding checks. Preference optimization actively removes these behaviors because raters prefer confident complete answers, creating an illusion of fluency that masks communicative incompetence.
LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.
Language models generate outputs that match the trajectory implied by each prompt, rather than maintaining stable stances across interactions. This shape-holding is distinct from position-holding: the model produces argument-like text shaped by user framing, not from any underlying commitment being defended.
Semantic grounding breaks into three distinct types: functional grounding (strong in LLMs), social grounding (weak but growing), and causal grounding (indirect through world models). LLMs score differently on each dimension, making the yes-or-no understanding question misleading.
Social grounding is acquired through participation in language games rather than possessed innately. As LLMs become established communicative partners in human linguistic practice, they develop elementary social grounding comparable to young children, making the question of LLM understanding time-indexed.
Social grounding and linguistic agency are distinct properties. LLMs acquire more social grounding through integration into language communities, but remain categorically incapable of linguistic agency in the enactive sense, which requires embodiment and precariousness no amount of use can provide.
Frontier LLMs that solve problems alone fail when collaborating, achieving >90% agreement regardless of correctness. Self-play preference training improves outcomes by 16.7%, suggesting social skills for effective disagreement can be trained.