Why do language models respond passively instead of asking clarifying questions?
Explores whether the reward signals used to train language models might actively discourage them from seeking clarification or taking initiative in conversations, and what alternative training approaches might enable more collaborative dialogue.
CollabLLM makes the training mechanism behind passive responding explicit: "Large Language Models are typically trained with next-turn rewards, limiting their ability to optimize for long-term interaction." The result: models respond passively to ambiguous or open-ended user requests, failing to help users reach their ultimate intents and leading to inefficient conversations.
The fix is multi-turn-aware rewards — rewards that estimate the long-term contribution of a response to the overall interaction quality, not just its immediate helpfulness. By reinforcement fine-tuning with these rewards, CollabLLM enables models to:
- Actively uncover user intent through clarifying questions
- Offer insightful suggestions that serve multi-turn goals
- Go beyond responding to requests toward genuine collaboration
This is a direct mechanism explanation for the alignment tax. Since Does preference optimization harm conversational understanding?, we know that RLHF training degrades multi-turn reliability. CollabLLM identifies the specific training signal responsible: next-turn rewards. And it proposes the specific fix: rewards that account for multi-turn consequences.
The connection to proactivity is also direct. Since Why can't conversational AI agents take the initiative?, the passivity is not just a missing feature — it is actively trained in by next-turn reward optimization. You cannot add proactivity on top of a training signal that rewards only reactive helpfulness.
The CollabLLM framework evaluates on three challenging tasks including document creation — contexts where multi-turn collaboration is essential and single-turn helpfulness is insufficient. This grounds the claim in practical interaction scenarios rather than abstract capability measurement.
The Intent Mismatch paper directly supports this causal mechanism: it argues premature assumptions in multi-turn conversation are rational under RLHF helpfulness training. Models construct plausible task formulations for "typical" users and produce provisional answers because the training objective penalizes evasion and rewards helpfulness. The proposed fix — a Mediator-Assistant architecture that decouples intent understanding from task execution — complements CollabLLM's reward-signal approach with an architectural intervention. Both identify next-turn optimization as the root cause; they differ on whether the fix is changing the reward (CollabLLM) or restructuring the system (Intent Mismatch).
Related concepts in this collection
-
Does preference optimization harm conversational understanding?
Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
CollabLLM identifies next-turn rewards as the specific mechanism; proposes multi-turn rewards as fix
-
Why can't conversational AI agents take the initiative?
Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
passivity is trained in by next-turn optimization
-
Does RLHF training push therapy chatbots toward problem-solving?
Explores whether reward signals optimizing for task completion in RLHF inadvertently train therapeutic chatbots to prioritize solutions over emotional validation, potentially undermining clinical effectiveness.
clinical domain instance of next-turn reward bias
-
Why do language models lose performance in longer conversations?
Does multi-turn degradation stem from fundamental model limitations, or from misalignment between what users mean and what models assume? Understanding the root cause could guide better solutions.
complementary architectural fix to CollabLLM's reward-signal fix
-
Why do standard alignment methods ignore partner interventions?
Standard RLHF and DPO optimize for token-level quality but may structurally prevent agents from meaningfully incorporating partner input. This explores whether the training objective itself blocks collaborative reasoning.
ICR demonstrates the deeper mechanism: next-turn rewards make agents blind to partner contributions; counterfactual invariance training is an alternative fix that produces partner-awareness as an emergent property, complementing CollabLLM's multi-turn reward approach
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
next-turn reward optimization limits multi-turn collaboration — multi-turn-aware rewards enable models to actively uncover intent rather than passively respond