Can models learn to ask clarifying questions instead of guessing?
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
Current LLMs face three failure modes when receiving flawed or incomplete queries: they hallucinate an answer, they refuse to respond, or they provide a generic "I need more information" deflection. None of these is productive. The proactive critical thinking paradigm introduces a fourth option: identify specifically what is missing and generate a targeted question to request it.
The GSM-MC benchmark tests this by deliberately removing key variables from math problems. Results are dramatic:
- Vanilla models: 0.15% accuracy on proactive critical thinking tasks
- After RL training: 73.98% accuracy (Qwen3-1.7B)
- SFT alone: effective but RL is generally superior
The near-zero baseline reveals something important: despite extensive post-training that makes these models excellent at reasoning, they have almost no ability to detect when a problem is ill-posed and actively seek the missing piece. This is a specific capability gap, not a general reasoning limitation.
A striking secondary finding: inference-time scaling (activating "thinking mode") actually degrades proactive critical thinking in vanilla models. The extended thinking induces "counterproductive self-doubt rather than useful analysis, leading to a clear drop in performance." But after RL training, thinking mode becomes beneficial — the same mechanism that hurts untrained models helps trained ones.
This finding matters beyond math: a patient omitting critical symptoms, a user providing incomplete specifications, a student asking an ambiguous question — all require the agent to identify what's missing and ask, not just refuse or guess. Since Why can't conversational AI agents take the initiative?, proactive critical thinking is a concrete, trainable instantiation of the broader proactivity gap.
ProCoT (Proactive Chain-of-Thought) extends the paradigm from individual queries to multi-turn goal planning: rather than just detecting missing information in a single exchange, models generate explicit reasoning chains about conversation goals and plan proactive interventions across turns. This bridges proactive critical thinking (reactive: "this query is incomplete") with proactive dialogue (strategic: "given the user's goal, I should ask about X before they realize they need it").
The ALFA framework for clinical reasoning extends this by showing that question quality is multidimensional — a question can be clear but irrelevant, or relevant but ambiguous. ALFA decomposes "good question" into theory-grounded attributes (clarity, relevance, specificity) and trains against each via 80K attribute-specific preference pairs. This addresses a gap: proactive critical thinking shows models can learn to ask, but ALFA shows they need attribute-specific training to ask well. Additionally, research on clarifying question design shows that specific-facet questions ("What type of monitor?") consistently outperform need-rephrasing questions ("Can you be more specific?") for user satisfaction — the form of the question matters as much as the decision to ask.
Source: Conversation Agents, Conversation Topics Dialog, Conversation Architecture Structure
Related concepts in this collection
-
Why can't conversational AI agents take the initiative?
Explores whether current LLMs lack the structural ability to lead conversations, set goals, or anticipate user needs—and what architectural changes might enable proactive dialogue.
proactive critical thinking is a specific trainable form of the general proactivity gap
-
Can models identify what information they actually need?
When a reasoning task is missing a key piece of information, can language models recognize what's absent and ask the right clarifying question? QuestBench tests this capability directly.
QuestBench confirms: well-specified reasoning ≠ missing-information detection
-
When does explicit reasoning actually help model performance?
Explicit reasoning improves some tasks but hurts others. What determines whether step-by-step reasoning chains are beneficial or harmful for a given problem?
thinking mode degrades proactive questioning in vanilla models (another case of reasoning-type mismatch)
-
Can models learn to ask genuinely useful clarifying questions?
Explores whether question-asking quality is teachable through decomposing it into specific attributes like clarity and relevance, rather than treating it as a monolithic skill.
ALFA provides the quality methodology for making clarifications effective
-
Which clarifying questions actually improve user satisfaction?
Not all clarification helps equally. This explores whether asking users to rephrase their needs works as well as asking targeted questions about specific information gaps.
question form matters as much as decision to ask
-
Can agents learn to reason better without just chasing rewards?
Explores whether reinforcement learning can train agents to exhibit genuine metacognitive reasoning—planning, reflection, exploration, monitoring—rather than simply optimizing for task success through any means necessary.
complementary metacognitive RL: RLVMR trains monitoring/reflection during agentic execution; proactive critical thinking trains missing-information detection before reasoning begins; both operationalize metacognition as trainable RL objectives
-
When should AI agents ask users instead of just searching?
Explores whether tool-enabled LLMs should probe users for clarification when uncertain, rather than silently chaining tool calls that drift from intent. Examines conversation analysis patterns as a formal alternative.
CA's insert-expansion framework provides the conversational structure (pre-second, post-first) for deploying proactive questioning in dialogue contexts
-
Why do reasoning models overthink ill-posed questions?
Explores why models trained for extended reasoning produce drastically longer, less useful responses to unanswerable questions—and whether this represents a fixable training deficit or inherent limitation.
describes the behavioral failure proactive critical thinking corrects: without training, models ruminate unproductively on missing-premise questions; RL training transforms counterproductive self-doubt into targeted clarification
-
How can models select the most informative question to ask?
Explores whether simulating possible futures and scoring questions by information gain can identify which clarifying question would best reduce uncertainty—moving beyond just deciding whether to ask toward deciding what to ask.
complementary capability: proactive critical thinking detects THAT information is missing; UoT determines WHICH question most efficiently recovers it
-
What makes strategic question-asking succeed or fail?
Explores whether excellent performance at multi-turn questioning requires one dominant skill or the coordinated interaction of multiple distinct capabilities. Matters because many real-world tasks (diagnosis, troubleshooting, clarification) depend on this ability.
20Q reveals the three-capability synergy needed beyond mere detection: state tracking, planning, and inductive reasoning must work together
-
Why do language models lose performance in longer conversations?
Does multi-turn degradation stem from fundamental model limitations, or from misalignment between what users mean and what models assume? Understanding the root cause could guide better solutions.
the trainable capability complement to the Mediator-Assistant architecture: proactive questioning addresses the intent alignment gap from the capability side while the Mediator addresses it architecturally
-
Can AI agents learn when they have something worth saying?
What if AI proactivity came from modeling intrinsic motivation to participate rather than predicting who speaks next? This explores whether a framework based on human cognitive patterns—internal thought generation parallel to conversation—can make agents genuinely responsive rather than passively reactive.
complementary proactivity approaches: proactive critical thinking trains the capability to detect missing information; Inner Thoughts provides the motivational architecture for deciding when to deploy it in social conversation contexts
-
Can conversations themselves personalize without user profiles?
Can a conversational AI learn about user traits and adapt in real time by rewarding itself for asking insightful questions, rather than relying on pre-collected profiles or historical data?
complementary uncertainty reduction: proactive critical thinking detects missing task information; curiosity reward reduces uncertainty about who the user is; both reward active information-seeking over passive response generation, but targeting different knowledge gaps (task-level vs user-level)
-
Why do users drift away from their original information need?
When users know their knowledge is incomplete but cannot articulate what's missing, do they unintentionally shift topics? And can real-time systems detect this drift?
the user-side complement: proactive critical thinking trains the AI to detect missing information, but ASK shows users themselves cannot articulate what they lack; combining ASK detection (84% precision) with proactive questioning could intervene before topic drift compounds the underspecification
-
Why do AI agents misalign with what users actually want?
UserBench explores how often AI models fully understand user intent across multi-turn interactions. The study reveals that human communication is underspecified, incremental, and indirect — traits that challenge current models to actively clarify goals.
UserBench quantifies the cost of absent proactive questioning: the <30% preference discovery rate confirms that current models lack the proactive critical thinking needed to surface underspecified user intents
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
proactive critical thinking enables models to identify missing information and actively request clarification rather than passively refusing or hallucinating answers