Conversational AI Systems Language Understanding and Pragmatics

What six problems must every conversation solve?

Schegloff's Conversation Analysis identifies six universal organizational challenges that speakers navigate in all talk-in-interaction. Understanding these helps explain why current AI dialogue systems fall short of human fluency.

Note · 2026-02-22 · sourced from Conversation Architecture Structure

Schegloff's Conversation Analysis identifies six "generic orders of organization" — problems that every conversation must solve for orderly interaction to proceed:

Turn-taking — who should talk next and when? How does this affect the construction and understanding of turns themselves?
Action-formation — how are the resources of language, body, environment, and position fashioned into recognizable actions (requesting, inviting, complaining, agreeing, rejecting) in a class of unknown size?
Sequence-organization — how are successive turns formed to be "coherent" with prior turns, and what is the nature of that coherence?
Trouble-handling — how to deal with problems in speaking, hearing, and understanding so that interaction doesn't freeze, intersubjectivity is maintained or restored, and the sequence can progress?
Word-selection — how are the components of a turn selected, and how does that selection shape understanding by recipients?
Overall structural organization — how does the overall composition of an interaction get structured, and how does placement inform the construction and understanding of talk?

These are not theoretical constructs — they have empirical support and appear to be language-universal. Properties of sequence organization also generalize to text-based chats, though digital interaction may follow slightly different patterns.

The practical relevance for AI: current dialogue systems explicitly address only turn-taking (who responds) and action-formation (intent classification). Sequence-organization is partially addressed through context windows. Trouble-handling (repair) is almost entirely absent — since Do language models actually build shared understanding in conversation?, models skip the repair sequences that humans use to maintain intersubjectivity. Word-selection receives some attention through style control. Overall structural organization is ignored.

This means current AI dialogue addresses roughly 2 of 6 fundamental conversational requirements. The other 4 represent a design space that remains largely unexplored.

Since What three layers must discourse systems actually track?, Schegloff's six orders provide the conversational-level complement to Grosz & Sidner's discourse-level framework. The two are not competing — they describe different levels of the same phenomenon.

Source: Conversation Architecture Structure

Related concepts in this collection

What three layers must discourse systems actually track? Grosz and Sidner's 1986 framework proposes that discourse requires simultaneously tracking linguistic segments, speaker purposes, and salient objects. Understanding why all three are necessary helps explain where current AI systems structurally fail.
discourse-level complement to Schegloff's conversational-level framework
Do language models actually build shared understanding in conversation? When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
repair (trouble-handling) is the specific mechanism for maintaining intersubjectivity that LLMs skip
Why do language models sound fluent without grounding? Explores whether LLM fluency masks the absence of communicative work—the clarifying questions, acknowledgments, and understanding checks that humans perform. Why does skipping these acts make models sound more confident?
grounding acts span multiple of Schegloff's six orders
Why do language models skip the calibration step? Current LLMs assume shared understanding rather than building it through dialogue. This explores why that design choice persists and what breaks when it fails.
dynamic grounding addresses trouble-handling specifically
Can AI systems detect and correct misunderstandings after responding? How do conversational systems recognize when their previous response was based on a misunderstanding, and what mechanism allows them to correct it retroactively rather than restart?
TPR is a specific instantiation of the trouble-handling generic order: reactive correction at T3 after T2 reveals misunderstanding
What semantic failures break dialogue coherence most realistically? Can we distinguish distinct types of incoherence by manipulating semantic structure rather than surface text? This matters because text-level evaluations miss the semantic failures that actually occur in dialogue systems.
DEAM's semantic-level failure modes map to specific generic orders: contradiction/coreference → trouble-handling, decreased engagement → action-formation, irrelevancy → sequence-organization
Why do dialogue systems lose context when topics return? Stack-based dialogue management removes topics after they're resolved, making it hard for systems to reference them later. Does this structural rigidity explain why conversational AI struggles with topic revisitation?
topic management is a specific instantiation of the "overall structural organization" generic order; the stack-vs-attention debate addresses how this order should be solved architecturally
Can models learn when NOT to speak in conversations? Does training AI to explicitly predict silence—through a dedicated silent token—help models understand when intervention adds value versus when they should stay quiet? This matters for building conversational agents that feel naturally helpful rather than intrusive.
DiscussLLM directly addresses the turn-taking generic order: the silent token mechanism trains models to solve the "who should talk next and when" problem, which Schegloff identifies as the first organizational requirement of conversation
When should AI agents ask users instead of just searching? Explores whether tool-enabled LLMs should probe users for clarification when uncertain, rather than silently chaining tool calls that drift from intent. Examines conversation analysis patterns as a formal alternative.
insert-expansions instantiate sequence-organization and trouble-handling simultaneously: they create embedded pair structures within the main sequence (sequence-organization) to repair potential misunderstanding before it compounds (trouble-handling)

Concept map

17 direct connections · 112 in 2-hop network ·medium cluster

What six problems must every conversation solve? What three layers must discourse systems actually … Do language models actually build shared understan… Why do language models sound fluent without ground… Why do language models skip the calibration step? Can AI systems detect and correct misunderstanding… What semantic failures break dialogue coherence mo… Why do dialogue systems lose context when topics r… Can models learn when NOT to speak in conversation…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

sequence organization in talk-in-interaction has six generic orders that all conversation must solve — turn-taking action-formation sequence-organization trouble-handling word-selection and overall structure