Large Language Model Guided Tree-of-Thought
“Fields Medal winner Terence Tao once shared his experiences solving hard math problems1: “When I was a kid, I had a romanticized notion of mathematics, that hard problems were solved in Eureka moments of inspiration... With me, it’s always, Let’s try this. That gets me part of the way, or that doesn’t work. Now let’s try this. Oh, there’s a little shortcut here... You work on it long enough and you happen to make progress towards a hard problem by a back door at some point. At the end, it’s usually, oh, I’ve solved the problem.” The problem solving process as he described is a tree-like thinking process, rather than a linear chain-of-thought [9]. The limitation of linear response generation is also apparent from a computational complexity perspective. The number of computation steps an auto-regressive LLM can perform is polynomial in terms of its input length. Unless P = NP holds which contradicts the widely accepted belief, there would be problems in NP that is not solvable by auto-regressive LLMs.
Inspired by these two shortcomings of auto-regressive LLMs, we propose a novel framework which augments an LLM with several additional modules including an automatic “prompter agent”. This framework employs a solution search strategy we call the Tree-of-Thought (ToT2). This strategy solves a problem through a multi-round conversation between the LLM and the prompter agent. Figure 1a provides a visual description of the ToT search strategy, in which the LLM plays a crucial role in guiding the search for solutions. To make it more concrete, let us assume the problem to be solved is an instance of the Sudoku puzzle. The “root” node represents the initial state, corresponding to when a human mind just reads through the problem description, and begins the thinking process. A blue node in the figure represents a valid partial solution, which can be used by the LLM as a basis to generate the next search step. In the context of Sudoku puzzle solving, this means presenting a partially filled Sudoku board to an LLM and letting the LLM fill in a few more cells. The rationale is that an LLM like GPT-4 has been trained on a vast amount of text corpus which includes many Sudoku puzzle solutions. Given a partially filled board, likely the LLM is able to recognize the pattern, and provide useful insights on how to proceed following the Sudoku rules. Hence, it is highly probable that a search guided by the LLM is significantly more efficient than a brute-force search. In the figure, the search steps guided by the LLM are represented by the solid arrows. However, these steps generated by the LLM are not guaranteed to be always logically correct. Thus, we introduce a “checker module” to perform correctness checks. In Figure 1a, a gray node with an “X” marker represents a “dead-end”, i.e. a partial solution that the checker module considers as invalid. For Sudoku, this means the partially filled board violates the Sudoku rules. If the current node is invalid, obviously we need to return to a parent or an ancestor node in order to correct the mistake. This can be coordinated by a module called the “ToT controller” which oversees the ToT search. With the backtracking capability, the system can regenerate the solution and thus recover from errors. In addition, even when the current node is valid, if the system remains stuck at it for too long, the ToT controller could issue a backtrack signal to explore other possible solutions. This is similar to a scenario where a human mind realizes that there is no viable path towards reaching the final solution through a particular direction, prompting her to change course and explore alternative routes. This process continues until either a full solution is found (represented by a green node in the figure), or a pre-specified maximum round of conversations is reached.”