How can models select the most informative question to ask?
Explores whether simulating possible futures and scoring questions by information gain can identify which clarifying question would best reduce uncertainty—moving beyond just deciding whether to ask toward deciding what to ask.
Most work on clarifying questions addresses WHETHER to ask. Uncertainty of Thoughts (UoT) addresses WHAT to ask — and provides a principled, information-theoretic mechanism for selecting the optimal question.
The algorithm has three components working together:
Uncertainty-aware simulation: the model generates multiple candidate questions, then simulates possible future scenarios for each — what might the user answer, and what would each answer imply? These simulations form a tree structure of possible futures.
Information-gain rewards: each simulated path is scored by how much it reduces the model's uncertainty about the true answer. Questions whose possible answers would maximally distinguish between remaining possibilities score highest.
Reward propagation: expected rewards are computed across all simulated futures, allowing selection of the question with highest expected information gain — the one that, on average across possible answers, most reduces uncertainty.
The medical diagnosis framing makes the mechanism concrete: a patient doesn't report full symptoms. The doctor must decide which question to ask next. A question like "Do you have a fever?" partitions the diagnostic space differently than "Have you traveled recently?" UoT formalizes this: given the current possibility set (diseases consistent with reported symptoms so far), which question's possible answers would most effectively narrow that set?
This connects directly to proactive critical thinking. Since Can models learn to ask clarifying questions instead of guessing?, the gap that proactive critical thinking fills is DETECTING incompleteness. UoT fills the complementary gap: once incompleteness is detected, SELECTING the most informative question to ask. And since Which clarifying questions actually improve user satisfaction?, UoT provides the mechanism for generating specific-facet questions rather than generic "can you be more specific?" prompts — the information-gain criterion naturally selects for questions that target the highest-value information asymmetry.
The connection to test-time scaling is architectural: UoT is essentially test-time compute applied to question generation. The simulation-propagation loop trades inference-time computation for better question selection, analogous to how reasoning models trade computation for better answers. Since Can dialogue planning balance fast responses with strategic depth?, UoT's simulation-propagation loop could serve as the System 2 question-selection mechanism within dual-process dialogue planning -- when uncertainty triggers the MCTS planner, information-gain scoring provides a principled criterion for which clarifying question to generate next. And since Can tree search replace human feedback in LLM training?, UoT's reward propagation across simulated futures is structurally analogous to MCTS backpropagation -- both use tree search to extract quality signals from exploration of future states.
Source: Question Answer Search
Related concepts in this collection
-
Can models learn to ask clarifying questions instead of guessing?
Exploring whether large language models can be trained to detect incomplete queries and actively request missing information rather than hallucinating answers or refusing to respond. This matters because conversational agents today remain passive, responding only when prompted.
UoT provides the selection mechanism that proactive critical thinking needs: once missing information is detected, which question recovers it fastest
-
Which clarifying questions actually improve user satisfaction?
Not all clarification helps equally. This explores whether asking users to rephrase their needs works as well as asking targeted questions about specific information gaps.
information-gain criterion naturally selects specific-facet questions over generic rephrasing
-
Can AI agents communicate efficiently in joint decision problems?
When humans and AI must collaborate to solve optimization problems under asymmetric information, what communication patterns enable effective coordination? Current LLMs struggle with this—why?
UoT operationalizes the asymmetric information problem: simulate what the user might know, ask what most reduces the asymmetry
-
When should AI agents ask users instead of just searching?
Explores whether tool-enabled LLMs should probe users for clarification when uncertain, rather than silently chaining tool calls that drift from intent. Examines conversation analysis patterns as a formal alternative.
UoT provides the selection mechanism for which insert-expansion to use
-
Can dialogue planning balance fast responses with strategic depth?
Can a system use quick instinctive responses for familiar conversation contexts while activating deeper planning only when uncertainty demands it? This explores whether adaptive computation improves dialogue goal-reaching.
UoT's simulation loop could serve as the System 2 question-selection mechanism when uncertainty triggers MCTS planning
-
Can tree search replace human feedback in LLM training?
Explores whether Monte Carlo Tree Search can generate quality signals for self-improvement without expensive human annotations. Matters because annotation bottlenecks currently limit LLM scaling.
structural analogy: UoT's reward propagation across simulated futures parallels MCTS backpropagation of quality signals
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
uncertainty-aware question selection via information gain simulates possible futures to determine the optimal next question to ask