Uncertainty of Thoughts: Uncertainty-Aware Planning Enhances Information Seeking in Large Language Models

Paper · arXiv 2402.03271 · Published February 5, 2024

In this work, we introduce Uncertainty of Thoughts (UoT), an algorithm to augment large language models with the ability to actively seek information by asking effective questions. UoT combines 1) an uncertainty-aware simulation approach which enables the model to simulate possible future scenarios and how likely they are to occur, 2) uncertainty-based rewards motivated by information gain which incentivizes the model to seek information, and 3) a reward propagation scheme to select the optimal question to ask in a way that maximizes the expected reward.

leading to a growing need for LLMs that can actively seek the information they need to solve a task by asking questions in conversational settings. For example, in medical diagnosis, patients often do not initially report their symptoms in full detail. In such situations, a doctor’s ability to ask effective questions is crucial, as a successful diagnosis often depends on revealing important details that the patient did not initially provide (Figure 1).

To enhance LLMs in actively seeking information, we introduce Uncertainty of Thoughts (UoT), a plug-and-play approach that improves LLMs’ abilities to ask useful questions by modeling their own uncertainty. UoT is a principled approach relying on uncertainty-based rewards motivated by information gain, which incentivizes a model to seek information in a way that maximally reduces the amount of information it does not know. To utilize these rewards, we develop an uncertainty-aware simulation framework, enabling the model to simulate possible future scenarios along with how likely they are to occur. Given these scenarios, we utilize a reward propagation scheme to select the optimal question to ask in a way that maximizes the expected reward.

The interaction between the Questioner and the Answerer occurs over multiple turns. For instance, the Questioner may ask, “Do you have a fever?", to which the Answerer responds, “Yes, I’ve had a high fever for the past two days." The Questioner then asks another question such as “Have you vomited?" This exchange continues either until the Questioner correctly determines the final answer, or the conversation reaches a maximum number of turns. At this point, the interaction ends, and the Questioner is successful if it has correctly determined the true option ω.

Most of the description of our approach focuses on the closed set scenario, in which we assume that the Questioner starts with knowledge of the possibility space Ω, e.g., the set of all possible diseases in medical diagnosis. In our extension section 2.7, we adapt our approach to the open set scenario, in which this knowledge is absent. Moreover, as the questioning progresses, we use an LLM to gradually refine this set of possibilities to those that are consistent with the current answers given so far by the Answerer. Define the current possibility set Ωi as the subset of Ω that is consistent with all answers given by the Answerer before the start of the ith interaction step.

As we discuss more later, we focus on applications where answers can be grouped into a small number of semantically distinct categories (in our case, affirmative and negative responses), as this allows us to compute meaningful uncertainty metrics in a simpler way. Conceptually, our framework can straightforwardly be extended to allow for a wider selection of answers.

As Figure 2 shows, to effectively reduce uncertainty, our UoT method first generates multiple questions as candidates to ask, and simulates possible futures for each one in the form of a tree structure. Next, uncertainty-based rewards, motivated by information gain, are used to assess the questions within the simulation. Finally, a reward propagation scheme is used to compute the expected reward from asking each candidate question, allowing us to select the one with highest expected reward, to ask the Answerer.