What makes specific-facet questions outperform generic need-rephrasing requests?
This explores why clarifying questions that name a concrete gap ('What screen size?') get better results than ones that throw the work back to the user ('What are you trying to do?').
This explores why a question pointing at a specific missing detail beats one that asks the user to restate their need from scratch. The most direct answer in the corpus is that users engage when they can *foresee the payoff* — a specific-facet question shows them how answering will sharpen the result, while a rephrasing request makes them do the model's work without any visible reward Which clarifying questions actually improve user satisfaction?. The narrow question converts a vague intent into an actionable choice; the generic one just hands the ambiguity back.
The deeper reason becomes clear once you ask what "rephrase your need" is really compensating for. When a model produces a blurry response to a vague query, the failure is usually that the user never supplied enough scaffolding, so the model fell back on blended training-data priors — a kind of context collapse Why do large language models produce generic responses to vague queries?. A specific-facet question is the remedy aimed at exactly the right layer: it requests the one piece of scaffolding that's missing, rather than asking the user to rebuild the whole frame. "What type of monitor?" repairs a specific hole; "what are you trying to do?" assumes the user can diagnose their own context gap, which is the thing they couldn't do in the first place.
There's also a quality-of-question story underneath. Work on training models to ask better questions found that you get further by decomposing "good question" into concrete attributes — clarity, relevance, specificity — and optimizing each, rather than chasing a single fuzzy quality score Can models learn to ask genuinely useful clarifying questions?. A specific-facet question scores high on all three at once; a need-rephrasing prompt is relevant but deliberately *un*specific, which is precisely the attribute that drives engagement and downstream decision quality.
The corpus also hints that the right clarifying move depends on the kind of question being asked. Different question types demand different retrieval and decomposition strategies — a comparison needs aspect-specific retrieval, an experience question needs filtering Does question type determine the right retrieval strategy?. A specific-facet clarifier is effectively the model declaring which facet matters for *this* type of request, doing the classification work up front instead of offloading it. And there's a quieter mechanical bonus: a concrete, well-formed clarification keeps the conversation tight, which matters because reasoning quality degrades as inputs bloat — even well below the context limit Does reasoning ability actually degrade with longer inputs?. A rephrasing loop tends to balloon the exchange before any real information arrives.
The thing you didn't know you wanted to know: the win isn't politeness or phrasing — it's *who carries the diagnostic burden*. Specific-facet questions work because the model has already figured out what's missing and is asking you to fill one slot. Need-rephrasing questions fail because they quietly demand that the user perform the very diagnosis they came to the model for help with.
Sources 5 notes
Clarifying questions that target concrete information gaps ("What type of monitor?") consistently beat those that ask users to rephrase their needs ("What are you trying to do?"). Users engage most when they can foresee how answering improves results.
Unlike social-media context collapse, which flattens multiple audiences, LLM collapse occurs when users provide insufficient contextual scaffolding and models default to blended training-data priors. This distinction suggests remedies should focus on query verification and user-driven context specification rather than platform controls.
The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.
Research shows non-factoid questions split into five types, each requiring different retrieval and aggregation methods. Evidence-based questions suit standard RAG, while debate and comparison need aspect-specific retrieval, and experience/reason questions need decomposition or filtering strategies.
FLenQA shows reasoning accuracy drops from 92% to 68% at just 3000 tokens of padding, far below context window capacity. The degradation is task-agnostic, uncorrelated with language modeling performance, and persists even with chain-of-thought prompting.