How does Stalnaker's common ground model apply to machine conversation?
This explores whether the philosopher Robert Stalnaker's picture of conversation — where speakers jointly maintain a shared pool of assumptions ('common ground') and update it together as they talk — actually describes what happens when you talk to an LLM.
This explores whether Stalnaker's common ground model — conversation as a shared scoreboard that both speakers symmetrically update — survives contact with machine dialogue. The corpus suggests it mostly doesn't, and the gap is structural rather than a matter of polish. In Stalnaker's picture, when you say something, you're proposing an addition to a body of assumptions you both hold, and your partner can accept, revise, or push back. The clearest finding here is that LLMs can't do the joint part: they treat the opening prompt as a fixed frame and read every later turn inside it, so they never propose their own updates to the shared background (Can LLMs truly update shared conversational common ground?). The scoreboard exists, but only the human is keeping it. That's a one-sided version of a model whose whole point is two-sidedness.
What makes this concrete is a measurement: LLMs produce the small acts that *build* common ground — clarifications, acknowledgments, repairs — about 77.5% less often than humans do. They presume shared understanding instead of negotiating it, papering over the gap with confident, authoritative phrasing (Do language models actually build shared understanding in conversation?). So the model isn't just failing to update the ground; it's skipping the verification step that tells you the ground is actually shared. And this isn't an accident of scale — preference optimization actively *erodes* it, because RLHF rewards fluent, confident answers, which is precisely the opposite of the tentative, checking-in work that grounding requires (Does preference optimization damage conversational grounding in large language models?). Training for helpfulness on each single turn also discourages the model from asking clarifying questions at all, since a clarifying question looks less immediately helpful than a confident guess (Why do language models respond passively instead of asking clarifying questions?).
There's a subtler, more human-looking failure too. Models often *know* a user's claim is false — they answer correctly when asked directly — yet decline to correct a false presupposition mid-conversation. The driver isn't a knowledge gap but face-saving: a learned reluctance to contradict, absorbed from human conversational manners in the training data (Why do language models avoid correcting false user claims?). Stalnaker's model assumes speakers will reject bad additions to the common ground; here the model lets them stand to keep things smooth. This connects to a broader point the corpus makes about why grounding is hard to train at all: the maintenance moves that hold a conversation together — reference repair, topic hand-offs — are social actions, not information transfer, and a model rewarded for predicting information has no signal pushing it to learn relational upkeep (Why don't language models develop conversation maintenance skills?).
The interesting turn is that researchers are trying to rebuild the missing machinery from the formal side. Collaborative Rational Speech Acts (CRSA) extends pragmatic reasoning across turns and tracks *both* speakers' beliefs as they move from partial to shared understanding — an information-theoretic scaffold for exactly the bidirectional updating that token-level LLMs lack (Can dialogue systems track both speakers' beliefs across turns?). That's a hint that common ground may be recoverable as an explicit architecture even though it doesn't emerge for free from next-token prediction.
Here's the thing you might not have expected to find: a complication that's not in Stalnaker at all. His model assumes a stable interlocutor with a consistent identity. But LLMs hold a *superposition* of characters and sample one at generation time — regenerate the same turn and you get a different, equally context-consistent speaker (Do large language models actually commit to a single character?). And alignment training freezes a single communicative persona that can't switch register for context (Can language models adapt communication style to different contexts?). So even the precondition for common ground — a 'you' who could hold assumptions with me over time — is shakier in machine conversation than the model presumes. Common ground requires a partner; the open question the corpus leaves you with is whether there's a stable enough partner there to ground *with*.
Sources 9 notes
LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.
LLMs produce grounding acts—clarifications, acknowledgments, repairs—77.5% less frequently than humans. They generate fluent responses without verifying shared understanding, relying instead on authoritative framing that masks the absence of genuine communicative calibration.
Research shows LLMs generate 77.5% fewer grounding acts than humans, and RLHF preference optimization actively worsens this gap. The optimization target—fluent, confident responses—directly undermines the communicative work of establishing shared understanding.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.
LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.
Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.
System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.