Conversational AI Systems Design & LLM Interaction Language Understanding and Pragmatics

Can models learn to ask genuinely useful clarifying questions?

Explores whether question-asking quality is teachable through decomposing it into specific attributes like clarity and relevance, rather than treating it as a monolithic skill.

Note · 2026-02-22 · sourced from Conversation Topics Dialog
Why do AI conversations reliably break down after multiple turns? How should researchers navigate LLM reasoning research?

The ALFA (Aligning LLMs to Ask) framework addresses a specific capability gap: LLMs fail to ask effective questions under uncertainty, making them unreliable in domains where proactive information-gathering is essential for decision-making.

The framework has three components:

  1. Decompose — break down "good question" into theory-grounded attributes (e.g., clarity, relevance, specificity)
  2. Synthesize — controllably generate attribute-specific question variations (80K preference pairs)
  3. Align — preference-based optimization to learn asking better questions along fine-grained attributes

Applied to clinical reasoning using the MediQ-AskDocs dataset (17K real-world clinical interactions), ALFA demonstrates that question quality is not unitary — a question can be clear but irrelevant, or relevant but ambiguous. Decomposing quality into attributes and training against each one produces better overall question-asking than optimizing for a single "question quality" score.

The clinical domain makes the stakes concrete: a doctor who asks the wrong clarifying question may miss a critical symptom. Models that excel at static medical QA benchmarks still fail at the interactive task of gathering missing information through conversation. Since Can models learn to ask clarifying questions instead of guessing?, ALFA provides the methodology for making those clarifying questions actually good — not just present.

This connects to the broader clarification design finding. Since Which clarifying questions actually improve user satisfaction?, the attribute decomposition explains why: a question high on specificity and relevance but low on verbosity will outperform one that merely paraphrases the user's need. Attribute-specific training can target exactly the dimensions that matter.

PerQs provides practical validation of attribute-based question quality at scale. The Active Listening system populates prompt templates with 400+ real user interests (aggregated from ~39K anonymous user models) and generates personalized Q&A pairs (~19K total) via LLM. Deployed in Alexa Prize, personalized questions showed significant positive effects on perceived conversation quality. The interest-personalization dimension demonstrates that "good questions" are not just structurally well-formed (ALFA's clarity, relevance, specificity attributes) but also content-aligned with user interests — a dimension that attribute-specific training could incorporate as an additional quality axis.


Source: Conversation Topics Dialog

Related concepts in this collection

Concept map
13 direct connections · 95 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

training models to ask good questions requires decomposing quality into theory-grounded attributes and aligning via attribute-specific preference optimization