Why do specific clarifying questions outperform rephrased versions of user needs?

This explores why clarifying questions that ask for a concrete missing detail ("What size monitor?") beat questions that ask users to restate their goal ("What are you trying to do?").

This explores why specific, facet-targeting clarifying questions outperform ones that ask users to rephrase their needs. The short version from the corpus: users engage most when they can foresee how their answer will change the result. A specific question ("What type of monitor?") signals a concrete information gap the system has already located, so answering feels like progress; a rephrase-your-need question ("What are you trying to do?") pushes the burden of structuring the problem back onto the user, who came to the system precisely because they couldn't structure it themselves Which clarifying questions actually improve user satisfaction?.

Why does asking to rephrase fail? Other notes in the collection suggest the problem isn't really about question wording at all — it's about who is doing the scaffolding. When a model gives generic answers to vague queries, it's not merging audiences the way social-media "context collapse" does; it's defaulting to blended training priors because the user never supplied enough contextual scaffolding Why do large language models produce generic responses to vague queries?. A rephrase-your-need question asks the user to build that scaffolding from scratch. A specific facet question does the opposite: the system names the exact slot it needs filled, so the user only has to drop in a value.

The deeper lesson is that question quality is not one thing — it decomposes. The ALFA framework breaks "good question" into theory-grounded attributes like clarity, relevance, and specificity, and finds that training on these attributes separately beats optimizing for a single satisfaction score, especially in high-stakes clinical reasoning Can models learn to ask genuinely useful clarifying questions?. Specificity, in other words, is a measurable axis of question quality, not a stylistic preference — which is exactly why specific questions win on satisfaction.

There's a fascinating prerequisite hiding underneath all this: a system can only ask a specific question if it has first noticed *what* is missing. Several notes show this is a learnable but fragile skill. Models can be trained to spot missing information and request it rather than guess — reinforcement learning pushed proactive critical-thinking accuracy from near-zero to ~74% on deliberately under-specified problems Can models learn to ask clarifying questions instead of guessing? — and social meta-learning can grow this clarifying behavior even without explicit training, by teaching models to treat conversation as a source of information rather than a place to dump an answer Can models learn to ask clarifying questions without explicit training?. Without that skill, models overthink ill-posed prompts instead of recognizing them as unanswerable Why do reasoning models overthink ill-posed questions?.

The thing you might not have expected to learn: the strength of a clarifying question is mostly inherited from a step that happens *before* the question is asked. A specific question is the visible output of a system that has already done the hard work of locating the gap — and the corpus also notes that the right move depends on the question type itself, since different kinds of questions need different handling rather than one generic strategy Does question type determine the right retrieval strategy?. "Please rephrase" is what a system says when it hasn't done that work yet, and users can feel the difference.

Sources 7 notes

Which clarifying questions actually improve user satisfaction?

Clarifying questions that target concrete information gaps ("What type of monitor?") consistently beat those that ask users to rephrase their needs ("What are you trying to do?"). Users engage most when they can foresee how answering improves results.

Why do large language models produce generic responses to vague queries?

Unlike social-media context collapse, which flattens multiple audiences, LLM collapse occurs when users provide insufficient contextual scaffolding and models default to blended training-data priors. This distinction suggests remedies should focus on query verification and user-driven context specification rather than platform controls.

Can models learn to ask genuinely useful clarifying questions?

The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.

Can models learn to ask clarifying questions instead of guessing?

Reinforcement learning training increased proactive critical thinking accuracy from 0.15% to 73.98% on deliberately flawed math problems. Notably, inference-time scaling degraded this ability in untrained models but improved it after RL training, suggesting the capability is learnable but fragile without explicit training.

Can models learn to ask clarifying questions without explicit training?

Models trained via SML on complete problems generalize to underspecified tasks by asking for needed information and delaying answers. The training paradigm instills a meta-strategy of using conversation as an information source, addressing the premature-answering failure mode.

Why do reasoning models overthink ill-posed questions?

Reasoning models generate redundant, lengthy responses to questions with missing premises while non-reasoning models correctly identify them as unanswerable. Training optimizes for producing reasoning steps but never teaches models when to disengage.

Does question type determine the right retrieval strategy?

Research shows non-factoid questions split into five types, each requiring different retrieval and aggregation methods. Evidence-based questions suit standard RAG, while debate and comparison need aspect-specific retrieval, and experience/reason questions need decomposition or filtering strategies.

Why do specific clarifying questions outperform rephrased versions of user needs?

Sources 7 notes

Next inquiring lines