Demystifying Chains, Trees, and Graphs of Thoughts

Paper · arXiv 2401.14295 · Published January 25, 2024

The field of natural language processing (NLP) has witnessed significant progress in recent years, with a notable focus on improving large language models’ (LLM) performance through innovative prompting techniques. Among these, prompt engineering coupled with structures has emerged as a promising paradigm, with designs such as Chain-of-Thought, Tree of Thoughts, or Graph of Thoughts, in which the overall LLM reasoning is guided by a structure such as a graph. As illustrated with numerous examples, this paradigm significantly enhances the LLM’s capability to solve numerous tasks, ranging from logical or mathematical reasoning to planning or creative writing. To facilitate the understanding of this growing field and pave the way for future developments, we devise a general blueprint for effective and efficient LLM reasoning schemes. For this, we conduct an in-depth analysis of the prompt execution pipeline, clarifying and clearly defining different concepts. We then build the first taxonomy of structure-enhanced LLM reasoning schemes. We focus on identifying fundamental classes of harnessed structures, and we analyze the representations of these structures, algorithms executed with these structures, and many others. We refer to these structures as reasoning topologies, because their representation becomes to a degree spatial, as they are contained within the LLM context.

Despite continuous advances in the size and cognitive power of LLMs, solving elaborate tasks with a single straightforward prompt yields imprecise or plain incorrect results due to the left-to-right, one-token-at-a-time nature of generative Transformer models [195]. Therefore, recent works focus on guiding LLMs towards the final solution through intermediate steps. Examples of such schemes include Chain-of-Thought (CoT) [195], Tree of Thoughts (ToT) [213], Graph of Thoughts (GoT) [10], AutoGPT [160], ReAct [214], or LLMCompiler [110].

despite all these advancements, state-of-the-art schemes still exhibit numerous limitations. First, they are still limited to simple tasks such as Game of 24

Moreover, state-of-the-art prompting schemes often entail high inference costs [10], [213]. Third, designing, developing, maintaining, and extending these schemes is hard. On one hand, this is due to the rapid development and enrichment of the “LLM ecosystem” that must be seamlessly integrated into the prompting pipeline. This includes retrieval-augmented generation (RAG), accessing the Internet, executing Python scripts, fine-tuning, and others. On other hand, different concepts related to the LLM reasoning are not well-defined, hindering effective design of new more powerful schemes. For example, while many schemes rely on the notion of the LLM thought, it is not clear how it relates to concepts such as a prompt.

To address the above issues, we first identify and crystallize fundamental building blocks and concepts in the general prompt execution pipeline. Then, we analyze and clarify these blocks and concepts in the context of recent schemes such as CoT, ToT, and GoT (contribution #1). Our study is based on a broad analysis of recent works on LLM reasoning. Then, we use the gained insights to develop a general blueprint and a taxonomy of the LLM reasoning schemes, focusing on how the underlying structure of reasoning can be used to facilitate more efficient, effective, and productive prompting (contribution #2). For this, we observe that the reasoning process in many recent prompting schemes can be modeled as a graph. While the nature of interacting with the LLM is temporal, the representation of the graph structure behind the LLM reasoning is periodically merged with the LLM context, becoming – to a degree – spatial, thus forming different topologies. These topologies can be a plain path graph (as in CoT [195]), multiple parallel path graphs with a single root (as in CoT with Self- Consistency) [190], a tree (as in ToT [213]), or an arbitrary graph (as in GoT [10]). We then use our taxonomy to survey and analyze existing prompting schemes (contribution #3). We dissect these schemes into fundamental aspects such as the class of graphs (i.e., the topology) used to model the reasoning process, the representation of this reasoning, or the encoding of the reasoning schedule.

In the basic Input-Output (IO) prompting, the LLM provides a final reply immediately upon receiving the user initial prompt. There are no intermediate steps in the LLM reasoning. Chain topologies, introduced in Chain-of-Thought by Wei et al. [195], improve upon IO prompting by incorporating explicit intermediate “steps of reasoning” in addition to the input and output. Chain-of-Thought with Self-Consistency (CoT-SC) [190] improves upon CoT by introducing several independent reasoning chains, originating from the same initial input. Then, the best outcome from the final thoughts is chosen, according to a predefined function S. The driving idea is to harness the randomness within the LLM reasoning, as it can generate different thoughts from the same prompt.

Tree of Thoughts (ToT) [133], [213] elevates the CoT limitations by allowing prompt branching at any point of the chain of thoughts. Therefore, different exploration paths are not fundamentally independent, like in CoT-SC, but a chain of thoughts can branch during the reasoning process to explore different options. A single tree node represents a partial solution. Based on a given node, the thought generator constructs a given number k of new nodes. Then, the state evaluator generates scores for each such new node. Depending on the use case, the evaluation could be conducted using an LLM itself, or it can harness human scores. Finally, the schedule of extending the tree is dictated by the utilized search algorithm (e.g., BFS or DFS).

Finally, Graph of Thoughts (GoT) [10] enables arbitrary reasoning dependencies between generated thoughts. Similarly to ToT, every thought can generate multiple child thoughts. However, each thought can also have multiple parents, which can form an aggregation operation. GoT, allowing both branching (thoughts with out-degree > 1) and aggregation (thoughts with in-degree > 1) operations, can express – for example – reasoning patterns resembling dynamic programming, where GoT subgraphs are responsible for solving subproblems, which are then combined to form a final solution.

we define a thought to be a semantic unit of task resolution, i.e., a step in the process of solving a given task.

To encompass all these cases, we define a thought to be a semantic unit of task resolution, i.e., a step in the process of solving a given task. All the above examples fall into this definition: a step in task resolution can be a statement, a plan, a text passage, a set of documents, or a sequence of numbers. We model thoughts with nodes; edges between nodes correspond to dependencies between these thoughts. The details of these dependencies are also use case specific. For example, when generating a paragraph of text, if a given paragraph y is a refined version of a previous version x, then x and y become nodes in the topology, and there is an edge from x to y indicating that y depends on x. If the task is to sort a sequence of numbers, and the strategy is based on splitting the sequence into sub-sequences, sorting them independently, and merging, then the initial sequence could be modeled as a node x, and the sub-sequences would form further nodes y, z, ..., with edges (x, y), (x, z), ... from x to all the nodes modeling sub-sequences. Now, a reasoning topology is a graph of these nodes and edges.

4.2 Semantic Roles of Thoughts & Topologies

Graph nodes can model different aspects of reasoning. For example, in writing tasks, some nodes model plans of writing a paragraph, while other nodes model the actual paragraphs of text. We refer to such aspects as different semantic roles. As already observed in the prompting literature [10], semantic roles can also be modeled with graph theory, namely with heterogeneous graphs. This enables harnessing a powerful machinery for novel LLM reasoning works.

4.3 Fundamental Use Cases of Thoughts & Topologies

We identify two fundamental use cases of thoughts and topologies: in-context examples and reasoning steps that bring us towards a solution. In a topology of thoughts, a node v is reachable from another node u, if there exists a path from u to v. If a node is reachable from the node modeling the input task statement, we call such a node a solution node, and the corresponding topology is a solution topology.

We also illustrate fundamental concepts introduced in these works, namely multi-step reasoning, zero-shot reasoning, planning & task decomposition, task preprocessing, iterative refinement, and tool utilization. We finish this section with a comparative analysis and illustrations of example topology representations