Can training on text corpora teach what communicative acts produce?
This explores whether learning from text prediction alone can give a model the *effects* of communication — what an utterance does between people (repairing understanding, establishing shared ground, sustaining a relationship) — or only its surface form.
This question reads as: training predicts the *shape* of communicative acts, but can it teach what those acts actually accomplish? The corpus answers fairly sharply — no — and the most interesting part is *why* the gap is structural rather than a matter of scale. The cleanest statement comes from Bender & Koller's argument that meaning lives in the relation between an expression and a communicative intent; since models are trained on form-to-form prediction with no access to shared attention, they can imitate the marks meaning leaves on text without reconstructing the intent that produced them Can language models learn meaning from text patterns alone?. A communicative act is defined by what it does to a relationship between speakers, and that relational layer is exactly what text-prediction signals don't carry.
Several notes converge on the same point from different angles, which is where the synthesis gets interesting. One frames the missing ingredient as *social action*: conversation maintenance — reference repair, topic hand-off — sustains a relationship rather than conveying information, so a model rewarded for predicting information never develops it Why don't language models develop conversation maintenance skills?. Another reframes it as *event structure*: AI output carries communicative markers inherited from its corpus but lacks the event that produces a real utterance, so users unilaterally animate the 'event-residue' into a pseudo-exchange that only has structure on the human side Does AI generate genuine utterances or just text patterns?. Same gap, two vocabularies — the corpus teaches the residue, not the act.
The most striking evidence is quantitative. Models produce grounding acts — clarifications, acknowledgments, repairs, the moves that *build* shared understanding — 77.5% less often than humans, and instead presume common ground exists rather than checking for it Do language models actually build shared understanding in conversation?. And this isn't only an absence in the data; it's actively trained out. Preference optimization rewards confident single-turn answers over questions that verify understanding, imposing an 'alignment tax' that erodes the very acts communication depends on Does preference optimization harm conversational understanding?, while next-turn reward shaping teaches models to respond passively rather than discover what the user actually wants Why do language models respond passively instead of asking clarifying questions?. So the answer compounds: text training can't teach the *effect* of communicative acts, and the dominant fine-tuning objectives then suppress even the imitation of them.
Here's the thing a curious reader might not expect: passing a behavioral test of communication doesn't close the gap, it disguises it. Chalmers-style interpretability tests are satisfied by any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions — accountability, an evaluative stance — that text output alone can't demonstrate. The test is calibrated to the wrong phenomenon, generating confident false positives Does behavioral speech output prove communicative subjecthood?. The fluency that makes a model seem like it understands what its words *do* is the same fluency that hides the absence — authoritative framing standing in for genuine calibration.
If you want one takeaway worth carrying away: the limit isn't that models haven't read enough. A communicative act is a relational event, and a corpus only ever records its trace. You can become a perfect predictor of the trace and still never have performed the act — which is why the most fluent systems are precisely the ones whose missing communicative work is hardest to notice.
Sources 7 notes
Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
AI output carries communicative markers inherited from training data but lacks the event structure that produces actual utterances. Users supply the missing orientation through interpretive labor, creating a pseudo-event with structure only on the human side.
LLMs produce grounding acts—clarifications, acknowledgments, repairs—77.5% less frequently than humans. They generate fluent responses without verifying shared understanding, relying instead on authoritative framing that masks the absence of genuine communicative calibration.
RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.
CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.
Chalmers' test passes any system producing contextually appropriate text, but communicative subjecthood requires relational-normative conditions like accountability and evaluative stance. The test is calibrated to the wrong phenomenon, creating false positives like puppets that walk-shaped without walking.