Why do dialogue failures persist despite scaling language models?
If LLMs get better at text tasks with more training data, why don't dialogue-specific problems improve the same way? The question explores whether dialogue failures are capability gaps or structural training mismatches.
The vast majority of LLM training data is written monological text: articles, essays, books, web pages, documentation, code, social media posts. Even text that records dialogue (interview transcripts, fiction with conversation, forum threads) appears in the corpus as written text — third-person record of dialogue, not first-person dialogical engagement. The model trains by predicting the next token in this corpus, which means the operation it learns is text-continuation: given a span of writing, what comes next.
This training mode is monological. The model is never in dialogue during training. It never has to coordinate with another agent, never has to repair misunderstanding, never has to track another speaker's perspective updating in real time. The dialogical operation — two agents addressing each other, building shared understanding through reciprocal moves — is not a training signal. The model can only encounter this operation as text-about-it, not as text-of-it.
The failure modes of LLM dialogue track this exactly. Topic drift in multi-turn conversation: the model lacks a persistent intentional structure for the dialogue because dialogues weren't training units. Presumption of common ground rather than its construction: the model has no training signal for the construction-of-common-ground operation, so it produces output as if the ground is already shared. Absence of conversational repair: the model has no training signal for the repair operation, so it does not perform repair when context indicates it is needed. Each failure is the absence of an operation the training mode never required the model to perform.
The diagnostic significance: many of LLM dialogue's failures are not capability deficits in the model — they are absences in the training mode. No amount of additional written-text training will produce the operations, because the operations are not in the training data and cannot be inferred from it. The training mode determines what failure modes will appear; structural changes to the training mode (training models in actual dialogue with other agents) would be required to address them.
This is why the standard "fix LLM dialogue with more text" approach has produced limited progress on dialogue-specific failures despite continued progress on text-continuation tasks. The problems that scale solves are problems within the training mode; the problems within dialogue specifically are not within the training mode. They require a different kind of training, or different post-training intervention — neither of which is purely a scale problem. Is all human language use fundamentally communicative? is the human-acquisition companion claim that explains why the asymmetry matters.
The strongest counterargument: dialogue-specific fine-tuning and RLHF on conversational examples partially close the gap. Yes, partially — and the partiality is informative. The fine-tuning can produce more dialogue-like surface output without producing the underlying operations, because the post-training signal is still text, just dialogue-shaped text. The mode is unchanged.
Source: Communication vs Language
Related concepts in this collection
-
Is all human language use fundamentally communicative?
Does human language always involve addressing another person, even in private writing or internal thought? This matters because it challenges how we define language use itself.
the human-acquisition companion claim
-
Are language models and human speakers doing the same thing?
Does treating LLM output and human communication as equivalent operations mask fundamental differences in how they work? This distinction shapes how we assess AI capabilities and risks.
the meta-discourse claim that follows from this training-mode asymmetry
-
Why don't conversational AI systems mirror their users' word choices?
Explores whether current dialogue models exhibit lexical entrainment—the human tendency to align vocabulary with conversation partners—and what's needed to bridge this gap in AI communication.
one of the specific dialogue-failure-modes this training-mode claim explains
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
LLMs are trained monologically on written language not dialogically in conversation — training mode determines failure mode