What makes relational structure sufficient for generating contextually appropriate discourse?

This explores a tension hiding inside the question: whether learning the relationships among words in text — with no grounding in the world — is genuinely *enough* to produce discourse that fits its moment, or whether 'contextually appropriate' is exactly where pattern-from-relation runs out.

This explores whether relational structure — the web of how words and phrases co-occur and contrast in text — is sufficient on its own to generate discourse that fits its context. The corpus gives a split verdict, and the split is the interesting part. On the 'sufficient' side, there's a strong claim that fluent, culturally situated language needs nothing but relational compression: models effectively operationalize Saussure's *langue*, learning meaning as a system of differences with no external referent or embodied grounding required Can language models learn meaning without engaging the world?. By that account, relational structure isn't just sufficient — it's the whole engine, and the appropriateness comes for free because the patterns it compresses are already patterns of *situated use*.

But look at what the same corpus finds when 'context' stops being a statistical regularity and starts being something you have to *infer in the moment*. Discourse competence turns out to be asymmetric: models handle explicit connectives ("because," "however") well but collapse to ~25% accuracy on implicit relations where the link has to be reconstructed from meaning rather than read off a surface marker Why does ChatGPT fail at implicit discourse relations?. The same shape recurs with reasoning — causal relations are handled better than temporal ones precisely because causal connectives are explicit and frequent in training, while temporal order is usually left implicit Why do LLMs handle causal reasoning better than temporal reasoning?. So relational structure is sufficient *when the relation is lexicalized in the surface form*, and brittle when appropriateness depends on inferring an unmarked link.

The limit gets sharper around pragmatics. Scalar implicature — deciding whether "some" implies "not all" — is something humans flexibly modulate by communicative stakes, focus, and face-threat. Models show essentially no such context-sensitivity; they compute the inference the same way regardless of what the situation demands Can language models adapt implicature to conversational context?. That suggests relational structure captures the *default* reading of a form but not the live recalibration that makes discourse appropriate to *this* exchange. The deeper version of that gap is conversational: models treat the opening prompt as a fixed frame and can't jointly update common ground, so when a user pivots or contradicts, the shared scoreboard isn't revised — the human ends up maintaining it alone Can LLMs truly update shared conversational common ground?.

There's a complementary clue in how appropriateness is actually *produced* rather than inferred. Chain-of-thought work finds that the format and spatial scaffolding of an answer shape output far more than its logical content — invalid reasoning chains work nearly as well as valid ones, because what's being generated is pattern-guided continuation, not formal computation What makes chain-of-thought reasoning actually work?. Read alongside the Saussure claim, this reframes the whole question: relational structure is sufficient for the *appearance* and often the substance of appropriate discourse because appropriateness, in text, is largely a learnable surface regularity. It becomes insufficient exactly where appropriateness requires tracking something the text doesn't encode — the other party's evolving commitments, the stakes of the current move, an inference no connective spells out.

The quietly useful takeaway: 'contextually appropriate discourse' isn't one capability but two that come apart. One is reproducing the situated patterns of language as a closed relational system — and there, relational structure really is sufficient. The other is *negotiating* context as a moving target between two minds — and there the corpus keeps finding the same wall, whether the symptom is implicit relations, unadapted implicature, or uneditable common ground. Even the persuasion literature fits this seam: presuppositions persuade more than assertions because they smuggle new content in as already-shared background Why are presuppositions more persuasive than direct assertions? — a maneuver that works *on* the relational frame precisely because the model can't contest the frame.

Sources 7 notes

Can language models learn meaning without engaging the world?

Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.

Why does ChatGPT fail at implicit discourse relations?

ChatGPT performs well on explicit discourse relations with connectives but achieves only 24.54% accuracy on implicit relations without them. This asymmetry reveals that LLMs rely on surface signals rather than inferring meaning from semantic content.

Why do LLMs handle causal reasoning better than temporal reasoning?

ChatGPT excels at causal relations but struggles with temporal ordering because causal connectives are explicit and frequent in training data, while temporal order is often implicit and must be inferred contextually.

Can language models adapt implicature to conversational context?

ChatGPT shows no context-sensitivity in computing scalar implicatures across three dimensions: explicit literal-mode instructions, information structure focus, and face-threatening contexts. Humans flexibly modulate these inferences; the model does not, suggesting pragmatic competence requires tracking communicative stakes that LLMs systematically miss.

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

What makes chain-of-thought reasoning actually work?

Research shows training format shapes reasoning strategy 7.5× more than domain, demo position swings accuracy 20%, and invalid CoT prompts work as well as valid ones. CoT is pattern-guided generation, not formal logic.

Why are presuppositions more persuasive than direct assertions?

Experimental evidence shows presuppositions with additive, iterative, and factive triggers persuade audiences more than assertions, especially for discourse-new content. The mechanism: presuppositions bypass evaluative scrutiny by presenting claims as already-accepted background.

What makes relational structure sufficient for generating contextually appropriate discourse?

Sources 7 notes

Next inquiring lines