When should AI agents ask users instead of just searching?
Explores whether tool-enabled LLMs should probe users for clarification when uncertain, rather than silently chaining tool calls that drift from intent. Examines conversation analysis patterns as a formal alternative.
Tool-enabled LLMs have a structural problem: when they can't immediately answer a query, they chain tool calls (search, calculation, code execution) and each intermediate step is conditioned on the output of the previous step. The result is progressive divergence from the user's original intent. The more tools the model uses, the further it drifts.
Conversation Analysis (Schegloff, 2007) offers a formal alternative from human talk-in-interaction. When human speakers can't immediately provide the expected response, they don't silently think harder — they insert a new pair of utterances to bridge the gap. These "insert-expansions" serve three functions: clarifying intent ("Do you mean the downtown location?"), scoping responses ("Are you looking for something under $50?"), and enhancing appeal ("I should mention it also comes in blue").
The key move is the "user-as-a-tool" paradigm: instead of the model consulting external tools and accumulating drift, it consults the user. The user provides necessary details and refines their request. This replicates exactly the structure of human insert-expansions — post-first inserts recover from misunderstandings, pre-second inserts gather information needed to choose the right response.
The empirical evidence from recommendation tasks shows benefits from this approach. But the deeper point is architectural: since Why can't conversational AI agents take the initiative?, the insert-expansion framework gives a principled answer to WHEN agents should break passivity — not by adding unsolicited content, but by asking structured questions when their internal processing would otherwise diverge.
This connects to the distinction between formal and functional linguistic competence: LLMs have formal competence (handling language in itself) but lack functional competence (doing things WITH language — reasoning, using world knowledge, establishing common ground). Insert-expansions are a functional linguistic capability. The paper argues that natural speech patterns may emerge as a side-effect of more closely imitated reasoning paths — if agents reason through dialogue rather than through silent chains.
Since Does preference optimization harm conversational understanding?, insert-expansions are precisely the kind of conversational work that RLHF training discourages — they slow things down, ask questions instead of answering, and score lower on single-turn helpfulness ratings, despite being more effective for multi-turn interaction. Insert-expansions are the PRE-EMPTIVE half of the repair space; since Can AI systems detect and correct misunderstandings after responding?, TPR provides the REACTIVE half -- correcting misunderstanding after it has already been acted on. Together they cover the full repair lifecycle: insert-expansions prevent, TPR recovers.
The insert-expansion framework connects to a trainable capability. Since Can models learn to ask clarifying questions instead of guessing?, RL training can bring proactive questioning from 0.15% to 73.98% accuracy — but the insert-expansion framework provides the conversational-analytic structure for WHEN and HOW to deploy that capability in dialogue, not just whether the model can detect missing information.
Related concepts in this collection
-
Why can't advanced AI models take initiative in conversation?
Despite extraordinary capability in answering and reasoning, LLMs fundamentally cannot initiate, redirect, or guide exchanges. Understanding this gap—and whether it's fixable—matters for building AI that truly collaborates rather than merely responds.
insert-expansions address one form of passivity: agent should probe when uncertain, not silently diverge
-
Does preference optimization harm conversational understanding?
Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
RLHF penalizes exactly the conversational work insert-expansions perform
-
Do language models actually build shared understanding in conversation?
When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
insert-expansions are a specific mechanism for building common ground
-
Why do language models sound fluent without grounding?
Explores whether LLM fluency masks the absence of communicative work—the clarifying questions, acknowledgments, and understanding checks that humans perform. Why does skipping these acts make models sound more confident?
insert-expansions are communicative work that fluent models skip
-
Can models learn to ask clarifying questions instead of guessing?
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
insert-expansions provide the conversational structure for deploying proactive questioning capability in dialogue
-
Can AI systems detect and correct misunderstandings after responding?
How do conversational systems recognize when their previous response was based on a misunderstanding, and what mechanism allows them to correct it retroactively rather than restart?
complementary repair mechanism: insert-expansions are pre-emptive, TPR is reactive; together they cover the full repair lifecycle
-
Which clarifying questions actually improve user satisfaction?
Not all clarification helps equally. This explores whether asking users to rephrase their needs works as well as asking targeted questions about specific information gaps.
insert-expansions define WHEN to probe; this research defines HOW to probe well — specific-facet questions outperform need-rephrasing, providing the content design principles for insert-expansion sequences
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
insert-expansions from conversation analysis provide a formal framework for when tool-enabled agents should probe users instead of silently diverging