Conversational AI Systems Design & LLM Interaction Language Understanding and Pragmatics

Can models learn to ask genuinely useful clarifying questions?

Explores whether question-asking quality is teachable through decomposing it into specific attributes like clarity and relevance, rather than treating it as a monolithic skill.

Note · 2026-02-22 · sourced from Conversation Topics Dialog

The ALFA (Aligning LLMs to Ask) framework addresses a specific capability gap: LLMs fail to ask effective questions under uncertainty, making them unreliable in domains where proactive information-gathering is essential for decision-making.

The framework has three components:

Decompose — break down "good question" into theory-grounded attributes (e.g., clarity, relevance, specificity)
Synthesize — controllably generate attribute-specific question variations (80K preference pairs)
Align — preference-based optimization to learn asking better questions along fine-grained attributes

Applied to clinical reasoning using the MediQ-AskDocs dataset (17K real-world clinical interactions), ALFA demonstrates that question quality is not unitary — a question can be clear but irrelevant, or relevant but ambiguous. Decomposing quality into attributes and training against each one produces better overall question-asking than optimizing for a single "question quality" score.

The clinical domain makes the stakes concrete: a doctor who asks the wrong clarifying question may miss a critical symptom. Models that excel at static medical QA benchmarks still fail at the interactive task of gathering missing information through conversation. Since Can models learn to ask clarifying questions instead of guessing?, ALFA provides the methodology for making those clarifying questions actually good — not just present.

This connects to the broader clarification design finding. Since Which clarifying questions actually improve user satisfaction?, the attribute decomposition explains why: a question high on specificity and relevance but low on verbosity will outperform one that merely paraphrases the user's need. Attribute-specific training can target exactly the dimensions that matter.

PerQs provides practical validation of attribute-based question quality at scale. The Active Listening system populates prompt templates with 400+ real user interests (aggregated from ~39K anonymous user models) and generates personalized Q&A pairs (~19K total) via LLM. Deployed in Alexa Prize, personalized questions showed significant positive effects on perceived conversation quality. The interest-personalization dimension demonstrates that "good questions" are not just structurally well-formed (ALFA's clarity, relevance, specificity attributes) but also content-aligned with user interests — a dimension that attribute-specific training could incorporate as an additional quality axis.

Source: Conversation Topics Dialog

Related concepts in this collection

Can models learn to ask clarifying questions instead of guessing? Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
ALFA provides the quality methodology for the proactive questioning capability
Which clarifying questions actually improve user satisfaction? Not all clarification helps equally. This explores whether asking users to rephrase their needs works as well as asking targeted questions about specific information gaps.
attribute decomposition explains why specific questions outperform rephrasing
Can models identify what information they actually need? When a reasoning task is missing a key piece of information, can language models recognize what's absent and ask the right clarifying question? QuestBench tests this capability directly.
ALFA directly trains the missing-information identification + question-asking capability
What makes strategic question-asking succeed or fail? Explores whether excellent performance at multi-turn questioning requires one dominant skill or the coordinated interaction of multiple distinct capabilities. Matters because many real-world tasks (diagnosis, troubleshooting, clarification) depend on this ability.
20Q reveals the three capabilities strategic questioning requires; ALFA's attribute-specific training directly shapes the planning component (question efficiency, specificity)

Concept map

13 direct connections · 95 in 2-hop network ·medium cluster

Can models learn to ask genuinely useful clarify… Can models learn to ask clarifying questions inste… Which clarifying questions actually improve user s… Can models identify what information they actually… What makes strategic question-asking succeed or fa…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

training models to ask good questions requires decomposing quality into theory-grounded attributes and aligning via attribute-specific preference optimization