Language Understanding and Pragmatics Conversational AI Systems

Why do dialogue failures persist despite scaling language models?

If LLMs get better at text tasks with more training data, why don't dialogue-specific problems improve the same way? The question explores whether dialogue failures are capability gaps or structural training mismatches.

Note · 2026-04-14
What kind of thing is an LLM really?

The vast majority of LLM training data is written monological text: articles, essays, books, web pages, documentation, code, social media posts. Even text that records dialogue (interview transcripts, fiction with conversation, forum threads) appears in the corpus as written text — third-person record of dialogue, not first-person dialogical engagement. The model trains by predicting the next token in this corpus, which means the operation it learns is text-continuation: given a span of writing, what comes next.

This training mode is monological. The model is never in dialogue during training. It never has to coordinate with another agent, never has to repair misunderstanding, never has to track another speaker's perspective updating in real time. The dialogical operation — two agents addressing each other, building shared understanding through reciprocal moves — is not a training signal. The model can only encounter this operation as text-about-it, not as text-of-it.

The failure modes of LLM dialogue track this exactly. Topic drift in multi-turn conversation: the model lacks a persistent intentional structure for the dialogue because dialogues weren't training units. Presumption of common ground rather than its construction: the model has no training signal for the construction-of-common-ground operation, so it produces output as if the ground is already shared. Absence of conversational repair: the model has no training signal for the repair operation, so it does not perform repair when context indicates it is needed. Each failure is the absence of an operation the training mode never required the model to perform.

The diagnostic significance: many of LLM dialogue's failures are not capability deficits in the model — they are absences in the training mode. No amount of additional written-text training will produce the operations, because the operations are not in the training data and cannot be inferred from it. The training mode determines what failure modes will appear; structural changes to the training mode (training models in actual dialogue with other agents) would be required to address them.

This is why the standard "fix LLM dialogue with more text" approach has produced limited progress on dialogue-specific failures despite continued progress on text-continuation tasks. The problems that scale solves are problems within the training mode; the problems within dialogue specifically are not within the training mode. They require a different kind of training, or different post-training intervention — neither of which is purely a scale problem. Is all human language use fundamentally communicative? is the human-acquisition companion claim that explains why the asymmetry matters.

The strongest counterargument: dialogue-specific fine-tuning and RLHF on conversational examples partially close the gap. Yes, partially — and the partiality is informative. The fine-tuning can produce more dialogue-like surface output without producing the underlying operations, because the post-training signal is still text, just dialogue-shaped text. The mode is unchanged.


Source: Communication vs Language

Related concepts in this collection

Concept map
14 direct connections · 105 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

LLMs are trained monologically on written language not dialogically in conversation — training mode determines failure mode