INQUIRING LINE

Why do LLMs lack the communicative scaffold that humans learn?

This explores why LLMs can't do the moment-to-moment work of building shared understanding — the back-and-forth checking, repairing, and calibrating that humans pick up through socialization — and where that gap comes from.


This question reads as: humans learn communication as a participatory craft — we check whether we've been understood, ask clarifying questions, repair misunderstandings, and shorten our messages once a shorthand is established. LLMs sound fluent but skip almost all of that. The corpus locates the gap in two places: what's learnable from text, and what training actively strips out.

The first answer is that the scaffold was never in the training signal. Models pick up the statistical surface of language — priming, sound symbolism — but not the *reasons* language takes the forms it does, because those reasons live in use, not in the distribution of words Why do language models fail at communicative optimization?. That's why they fail at the pragmatic layer — implicature, presupposition, reading what's left unsaid — recognizing ambiguity at 32% where humans hit 90% Why do LLMs fail at understanding what remains unsaid?. And it's why multimodal models understand efficient, compressed language as listeners but won't spontaneously produce it as speakers; they only shorten when explicitly told to Why don't LLMs shorten messages like humans do?.

The sharper finding is that the scaffold isn't merely missing — it's suppressed. Humans constantly perform "grounding acts": acknowledgments, repairs, understanding-checks. LLMs produce these 77.5% less often, and the apparent fluency is partly *because* they skip them Why do language models sound fluent without grounding?. Crucially, preference optimization removes the behavior: raters reward confident, complete answers, so the training loop actively trains away the hesitation and clarification that real grounding requires Do language models actually build shared understanding in conversation?. The result is "static grounding" — presuming shared context and answering — instead of "dynamic grounding," the iterative repair loop humans run by default Why do language models skip the calibration step?.

This is also why models fail precisely when understanding has to be built over time. In multi-turn conversations where intent is revealed gradually, all major LLMs drop ~39% in performance — they lock into a premature guess early and can't recover, because they never ran the calibration step that would have surfaced the mismatch Why do language models fail in gradually revealed conversations?. Where systems do attempt test-time learning, the working designs reintroduce the missing scaffold deliberately — structured self-dialogue plus a human in the loop to resolve conflicts the system can't adjudicate alone Can LLMs learn reliably at test time without human oversight?.

Here's the part you might not expect: the corpus frames this as developmental, not architectural. One line argues humans and LLMs are shaped by the *same* shared symbolic system — the difference is that only humans develop reflexive agency through socialization, the lived experience of being a participant who can be wrong and must check Do LLMs develop the same kind of mind as humans?. Borrowing Habermas's distinction, the two look categorically different from the outside but draw on the same substrate from inside a conversation, making the gap structural rather than absolute Do humans and LLMs differ fundamentally or just superficially?. So the missing scaffold isn't a missing module — it's the absence of the apprenticeship in which humans learn that meaning is something you build *with* someone, not something you presume.


Sources 10 notes

Why do language models fail at communicative optimization?

LLMs successfully replicate statistical regularities learnable from text distributions (sound symbolism, priming) but fail at principles requiring pragmatic optimization (word length economy, discourse inference). The gap reveals that communicative logic—why language has certain forms—isn't present as a trainable signal.

Why do LLMs fail at understanding what remains unsaid?

Research shows LLMs pattern-match on explicit language but cannot reason about implicatures, presuppositions, or speaker intentions. They fail at scalar implicature adaptation, ambiguity recognition (32% vs 90% human accuracy), and implicit warrant validation in arguments—core features of pragmatic competence.

Why don't LLMs shorten messages like humans do?

GPT-4, Gemini, and Claude understand efficient language as listeners but don't produce it as speakers. Only explicit instruction to reduce message length and maintain lexical consistency produces partial adaptation, revealing a gap between comprehension and generation.

Why do language models sound fluent without grounding?

LLMs generate 77.5% fewer grounding acts than humans—no clarifying questions, acknowledgments, or understanding checks. Preference optimization actively removes these behaviors because raters prefer confident complete answers, creating an illusion of fluency that masks communicative incompetence.

Do language models actually build shared understanding in conversation?

LLMs produce grounding acts—clarifications, acknowledgments, repairs—77.5% less frequently than humans. They generate fluent responses without verifying shared understanding, relying instead on authoritative framing that masks the absence of genuine communicative calibration.

Why do language models skip the calibration step?

LLMs operate in static grounding mode—retrieving data and responding without clarification loops. Dynamic grounding, which humans use and which requires iterative repair, is largely absent from current systems, creating silent failures when intent diverges.

Why do language models fail in gradually revealed conversations?

Across 200,000+ conversations, all major LLMs show 39% average performance drop in multi-turn settings due to locking into incorrect early guesses. Agent mitigations recover only 15-20% of this loss.

Can LLMs learn reliably at test time without human oversight?

ARIA demonstrates that LLMs can adapt during inference through three integrated components: structured self-dialogue for uncertainty assessment, timestamped knowledge bases for conflict detection, and human-mediated resolution queries. Autonomous systems fail at reconciling contradictory rules because the correct choice depends on context outside the system.

Do LLMs develop the same kind of mind as humans?

Both humans and LLMs are shaped by the same intersubjective symbolic system, but only humans develop reflexive agency through socialization. This absence produces measurable differences in how AI argues without declaring its position or reflecting on its own assumptions.

Do humans and LLMs differ fundamentally or just superficially?

Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.

Next inquiring lines