Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts

Paper · arXiv 2310.14628 · Published October 23, 2023
Tasks PlanningPrompts Prompting

As large language models (LLMs) have shown effectiveness with different prompting methods, such as Chain of Thought, Program of Thought, we find that these methods have formed a great complementarity to each other on math reasoning tasks. In this work, we propose XoT, an integrated problem solving framework by prompting LLMs with diverse reasoning thoughts. For each question, XoT always begins with selecting the most suitable method then executes each method iteratively. Within each iteration, XoT actively checks the validity of the generated answer and incorporates the feedback from external executors, allowing it to dynamically switch among different prompting methods. Through extensive experiments on 10 popular math reasoning datasets, we demonstrate the effectiveness of our proposed approach and thoroughly analyze the strengths of each module. Moreover, empirical results suggest that our framework is orthogonal to recent work that makes improvements on single reasoning methods and can further generalise to logical reasoning domain.

The design principle underlying XoT is its adaptable capability to switch methods, allowing for smooth integration with research aimed at improving individual methods. The line of iterative refinement methods enhances the model performance by asking the model to rethink on its previous response, serving as a good alternative for the reasoning module in XoT. Specifically, before moving on to another method at each iteration, we allow the model to first make self refinement on its current approach, making the best use of current method. Inspired by previous work (Madaan et al., 2023), after reasoning with one method for the first time, we require the model to analyze its response line-by-line and summarize several advice to mitigate the potential errors. Then, the model answers the question for a second time in the same method, with the summarized advice as a hint. After that, we verify the results produced by the second round and determine whether to switch to another method.