Design & LLM Interaction Psychology and Social Cognition LLM Reasoning and Architecture

How can models select the most informative question to ask?

Explores whether simulating possible futures and scoring questions by information gain can identify which clarifying question would best reduce uncertainty—moving beyond just deciding whether to ask toward deciding what to ask.

Note · 2026-02-22 · sourced from Question Answer Search
Why do AI agents fail to take initiative? How should we allocate compute budget at inference time? How should researchers navigate LLM reasoning research?

Most work on clarifying questions addresses WHETHER to ask. Uncertainty of Thoughts (UoT) addresses WHAT to ask — and provides a principled, information-theoretic mechanism for selecting the optimal question.

The algorithm has three components working together:

  1. Uncertainty-aware simulation: the model generates multiple candidate questions, then simulates possible future scenarios for each — what might the user answer, and what would each answer imply? These simulations form a tree structure of possible futures.

  2. Information-gain rewards: each simulated path is scored by how much it reduces the model's uncertainty about the true answer. Questions whose possible answers would maximally distinguish between remaining possibilities score highest.

  3. Reward propagation: expected rewards are computed across all simulated futures, allowing selection of the question with highest expected information gain — the one that, on average across possible answers, most reduces uncertainty.

The medical diagnosis framing makes the mechanism concrete: a patient doesn't report full symptoms. The doctor must decide which question to ask next. A question like "Do you have a fever?" partitions the diagnostic space differently than "Have you traveled recently?" UoT formalizes this: given the current possibility set (diseases consistent with reported symptoms so far), which question's possible answers would most effectively narrow that set?

This connects directly to proactive critical thinking. Since Can models learn to ask clarifying questions instead of guessing?, the gap that proactive critical thinking fills is DETECTING incompleteness. UoT fills the complementary gap: once incompleteness is detected, SELECTING the most informative question to ask. And since Which clarifying questions actually improve user satisfaction?, UoT provides the mechanism for generating specific-facet questions rather than generic "can you be more specific?" prompts — the information-gain criterion naturally selects for questions that target the highest-value information asymmetry.

The connection to test-time scaling is architectural: UoT is essentially test-time compute applied to question generation. The simulation-propagation loop trades inference-time computation for better question selection, analogous to how reasoning models trade computation for better answers. Since Can dialogue planning balance fast responses with strategic depth?, UoT's simulation-propagation loop could serve as the System 2 question-selection mechanism within dual-process dialogue planning -- when uncertainty triggers the MCTS planner, information-gain scoring provides a principled criterion for which clarifying question to generate next. And since Can tree search replace human feedback in LLM training?, UoT's reward propagation across simulated futures is structurally analogous to MCTS backpropagation -- both use tree search to extract quality signals from exploration of future states.


Source: Question Answer Search

Related concepts in this collection

Concept map
15 direct connections · 146 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

uncertainty-aware question selection via information gain simulates possible futures to determine the optimal next question to ask