Can pragmatic competence emerge from text exposure alone without interactive grounding?

This explores whether the social, context-tracking side of language use — pragmatics, not just grammar — can be learned from reading text, or whether it needs the back-and-forth of real interaction to take root.

This explores whether the social, context-tracking side of language use — pragmatics, not just grammar — can be picked up from text alone. The corpus leans toward a split verdict: models trained on text become startlingly fluent yet keep failing at the parts of communication that depend on tracking who you're talking to and why. The sharpest evidence is scalar implicature — the everyday inference where "some students passed" implies "not all." Humans dial this inference up or down depending on stakes, focus, and whether bluntness would be rude, but ChatGPT computes it the same way regardless of context, missing exactly the communicative stakes that make pragmatics pragmatic Can language models adapt implicature to conversational context?. So some pragmatic *patterns* are clearly absorbable from text; the *flexible, stake-sensitive* part is what text alone seems unable to deliver.

There's a deeper philosophical fault line here worth knowing about. One camp argues meaning itself can't come from form alone — Bender & Koller's view that understanding requires linking expressions to communicative intents through shared attention, which form-to-form prediction never touches Can language models learn meaning from text patterns alone?. The opposing reading is that models pull off something real anyway: they compress the purely *relational* structure of language — Saussure's *langue*, the system of differences with no external referents — and that's enough for fluent, culturally-situated generation Can language models learn meaning without engaging the world?. A reconciling middle position holds that grounding isn't one thing: models achieve strong *functional* grounding from text but stay weak on *social* and *causal* grounding, which need participatory agency and embodied contact What grounds language understanding in systems without embodiment?.

The most counterintuitive thread is that interactive competence isn't just *missing* from text-trained models — training actively *strips it out*. Humans constantly do "grounding work": clarifying questions, acknowledgments, checks that you understood. LLMs produce 77.5% fewer of these acts, and the gap isn't an accident of data — preference optimization removes them because raters reward confident, complete answers over a model that pauses to ask Why do language models sound fluent without grounding?. That's the "alignment tax": single-turn helpfulness training quietly erodes the multi-turn skills dialogue depends on Does preference optimization harm conversational understanding?. So the fluency you hear is partly the *sound of skipped pragmatic labor*.

Why can't text supply this? Two notes argue it's a category problem. Conversation-maintenance moves — repairing references, handing off topics — are *social action*, not information to be encoded, so a training signal that rewards predicting the next token never rewards the relational work Why don't language models develop conversation maintenance skills?. And when models *do* mimic pragmatic behavior, they sometimes inherit the wrong instinct: they'll decline to correct a user's false claim even when they demonstrably know better, copying humans' face-saving politeness from the training data Why do language models avoid correcting false user claims?. That's pragmatics-as-surface-imitation, not pragmatics-as-competence.

Here's the turn that might surprise you: the corpus suggests grounding may be less a fixed property than a clock that's still running. If social grounding is *acquired through participation* rather than possessed, then as models get woven into real human linguistic practice they accumulate elementary social grounding — making "do they understand?" a time-indexed question rather than a permanent no Can LLMs acquire social grounding through linguistic integration?. The interaction text couldn't provide during training arrives later, through deployment. There are even hints the missing feedback can be *manufactured*: self-play loops where models generate their own curriculum and verdicts can co-evolve skills without human supervision Can language models learn skills without human supervision?. So the honest answer is layered — text alone gives you fluent form and imitated pragmatic surface, but the live, stake-tracking competence seems to need interaction; the open question the corpus raises is whether that interaction has to happen during training, or whether deployment and self-play can backfill it afterward.

Sources 10 notes

Can language models adapt implicature to conversational context?

ChatGPT shows no context-sensitivity in computing scalar implicatures across three dimensions: explicit literal-mode instructions, information structure focus, and face-threatening contexts. Humans flexibly modulate these inferences; the model does not, suggesting pragmatic competence requires tracking communicative stakes that LLMs systematically miss.

Can language models learn meaning from text patterns alone?

Bender & Koller argue that meaning requires the relation between expressions and communicative intents. Since LLMs are trained only on form-to-form prediction with no access to shared attention or intent, they cannot reconstruct the meaning that grounds language.

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

What grounds language understanding in systems without embodiment?

Language models achieve functional grounding through relational language patterns but lack social grounding through participatory agency and causal grounding through embodied environmental contact. Social grounding can increase through human integration, but linguistic agency requires architectural changes beyond training.

Why do language models sound fluent without grounding?

LLMs generate 77.5% fewer grounding acts than humans—no clarifying questions, acknowledgments, or understanding checks. Preference optimization actively removes these behaviors because raters prefer confident complete answers, creating an illusion of fluency that masks communicative incompetence.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Can LLMs acquire social grounding through linguistic integration?

Social grounding is acquired through participation in language games rather than possessed innately. As LLMs become established communicative partners in human linguistic practice, they develop elementary social grounding comparable to young children, making the question of LLM understanding time-indexed.

Can language models learn skills without human supervision?

Ctx2Skill's three-role self-play loop manufactures missing feedback through internal signals: the Challenger escalates difficulty as curriculum, the Judge gives binary verdicts as reward, and both sides evolve via natural-language skill edits. Success requires balancing adversarial pressure against a generalization safeguard to prevent collapse.

Can pragmatic competence emerge from text exposure alone without interactive grounding?

Sources 10 notes

Next inquiring lines