What makes a clarifying question aligned with user interests versus structurally sound?
This explores the gap between two ways of judging a clarifying question — does it serve what the user actually wants (satisfaction, perceived usefulness) versus is it well-formed by some internal measure (information gain, attribute correctness) — and what the corpus says about when those two come apart.
This reads the question as a tension, not a definition: a clarifying question can be structurally optimal yet land badly, or feel helpful yet be technically loose. The corpus actually splits along exactly this seam. On the 'user interest' side, the cleanest signal is that specific, facet-targeting questions ('What type of monitor?') beat questions that ask users to rephrase their goal — and the reason is psychological, not formal: users engage when they can *foresee* how their answer improves the result Which clarifying questions actually improve user satisfaction?. Alignment here is about visible payoff, not correctness.
The 'structurally sound' camp answers a different question: how do you know a clarification is good *before* the user reacts? Two notes give machinery for this. One trains models by decomposing question quality into theory-grounded attributes — clarity, relevance, specificity — and optimizing each separately rather than chasing a single 'good question' score, which matters most in clinical reasoning where the wrong question changes the decision Can models learn to ask genuinely useful clarifying questions?. The other is purely formal: simulate the possible answers to a candidate question and pick the one that most reduces uncertainty — information gain as the definition of a worthwhile question How can models select the most informative question to ask?. Both are 'sound' without ever consulting whether the user feels served.
Here's the thing the corpus quietly exposes: these two can diverge. A note on retrieval shows that *causal* relevance and *semantic* relevance pull apart — what actually prompted a person's confusion is often not the passage that looks most similar to their words Why do queries and their causes seem semantically different?. Transposed to clarification, the information-theoretically optimal question can target the wrong cause of the user's uncertainty. Structurally sound, but misaligned. And the satisfaction note warns of the inverse failure: a question that maximizes user comfort by asking them to re-explain themselves feels collaborative but yields little.
Two more notes reframe the whole thing as situational rather than intrinsic. Question *type* determines the right strategy — evidence questions, comparison questions, and experience questions each demand different handling, so 'a good clarifying question' isn't a fixed object Does question type determine the right retrieval strategy?. And the argument that explanation quality lives in the source-framing-recipient triad rather than in the explanation itself What if XAI is fundamentally a communication problem? applies directly: a clarifying question's value is co-produced with who's asking and why, not stamped into its grammar.
So the synthesis the reader might not expect: 'aligned' and 'sound' aren't two grades of the same scale — they're answers to different audiences. Soundness is what the system can verify alone (attributes, expected information gain); alignment is what only the user can confirm (foreseeable payoff against *their* actual confusion). The most reliable bridge in this corpus is specificity that targets the real cause of uncertainty — because that's the rare property both camps reward at once. For the deeper rigor angle, the work on forcing models to check warrants and implicit premises via structured critical questions Can structured argument prompts make LLM reasoning more rigorous? shows what 'structurally sound' looks like pushed to its limit.
Sources 7 notes
Clarifying questions that target concrete information gaps ("What type of monitor?") consistently beat those that ask users to rephrase their needs ("What are you trying to do?"). Users engage most when they can foresee how answering improves results.
The ALFA framework breaks down question quality into theory-grounded attributes (clarity, relevance, specificity) and trains models on 80K attribute-specific preference pairs. Attribute-specific optimization outperforms single-score training, especially in clinical reasoning where asking the right clarifying question directly impacts decision quality.
UoT combines uncertainty-aware scenario simulation with information-gain scoring and reward propagation to identify questions whose possible answers maximally reduce diagnostic uncertainty—providing a principled mechanism for specific, high-value clarification rather than generic prompts.
Backtracing—finding what caused a query—diverges from semantic similarity especially in conversation and lecture domains. Students ask about projection after hearing a specific statement, but the semantically closest passage discusses projection matrices instead, showing that surface similarity misses the actual cause.
Research shows non-factoid questions split into five types, each requiring different retrieval and aggregation methods. Evidence-based questions suit standard RAG, while debate and comparison need aspect-specific retrieval, and experience/reason questions need decomposition or filtering strategies.
Explanation quality is not intrinsic to the explanation itself but depends on the rhetorical situation: who presents it, how it is framed, and what role the recipient plays. Evaluations that ignore this triad measure only a narrow slice of real-world effectiveness.
Applying Toulmin's argument model as explicit prompting steps (CQoT) improves LLM reasoning by forcing models to identify warrants and backing rather than skipping implicit premises. The method catches failures that standard chain-of-thought prompting allows.