Progressive-Hint Prompting Improves Reasoning in Large Language Models
The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability. However, these methods do not fully exploit the answers generated by the LLM to guide subsequent responses. This paper proposes a new prompting method, named Progressive-Hint Prompting (PHP), that enables automatic multiple interactions between users and LLMs by using previously generated answers as hints to progressively guide toward the correct answers. PHP is orthogonal to CoT and self-consistency, making it easy to combine with state-of-the-art techniques to further improve performance.
Prompt engineering in large-scale models has shown comparable or superior performance to full training set fine-tuning in enhancing reasoning ability, while also being significantly more sample-efficient [6, 7]. One area of research that aims to address this limitation is the use of Chain-of-Thought (CoT) approaches to promote intermediate reasoning steps [8–10]. Other works in this area, such as Least-to-Most [9] and Complex CoT [10], have also explored this direction. Another area of research is self-consistency-related approaches. In comparison to CoT-related work that focuses on designing better prompts, self-consistency proposes to sample multiple answers from the LLMs and arrive at the correct answer through a majority vote [10]. This approach is further improved upon by complex-based selection [10]. CoT-related and self-consistency-related works can be seamlessly combined without any conflict.
Prior research has not explored the potential of leveraging the outputs of LLM to refine reasoning paths iteratively. It stands to reason that similar to human cognition, LLM could benefit from reevaluating and adjusting its generated reasoning paths in order to correct errors and enhance overall performance. In this paper, we propose a new method named Progressive-Hint Prompting (PHP) that involves sequentially interacting with LLM to approach the correct answer gradually. The method the question and answer to re-ask the LLM and obtain the Subsequent Answer; (3) we repeat the operation in (2) until the answer is stable and does not change over the last two answers.
Among the diverse range of language comprehension tasks, we are particularly interested in multi-step reasoning because it exhibits two unique features. Firstly, LLMs significantly outperform smaller models on multi-step reasoning tasks [8], whereas their performance gains on tasks like sentiment classification can be limited [19]. Secondly, few-shot prompting outperforms full training set fine-tuning in multi-step reasoning tasks, even when conducted on LLMs [7].
Previous research has investigated various task-specific methods for identifying reasoning paths, including constructing semantic graphs [22], developing Recurrent Neural Network (RNN) models to retrieve reasoning paths from a Wikipedia graph [23], using human-annotated reasoning paths on math problems for fine-tuning [12], or training an extractor with heuristic-based pseudo reasoning paths [24]. A novel research work, named Self-Consistency [25], couples the generation of reasoning paths and a final answer by sampling from the decoder and using aggregation to retrieve the most consistent answer without extra modules. This approach has shown great promise, and it has the potential to outperform existing methods in terms of accuracy.