Instance-adaptive Zero-shot Chain-of-Thought Prompting
the efficacy of a singular, task-level prompt uniformly applied across the whole of instances is inherently limited since one prompt cannot be a good partner for all, a more appropriate approach should consider the interaction between the prompt and each instance meticulously. This work introduces an instance-adaptive prompting algorithm as an alternative zero-shot CoT reasoning scheme by adaptively differentiating good and bad prompts. Concretely, we first employ analysis on LLMs through the lens of information flow to detect the mechanism under zero-shot CoT reasoning, in which we discover that information flows from question to prompt and question to rationale jointly influence the reasoning results most. We notice that a better zero-shot CoT reasoning needs the prompt to obtain semantic information from the question then the rationale aggregates sufficient information from the question directly and via the prompt indirectly.
Plan-and-Solve [5] employs a human-crafted prompt to break down the question and automatically generates reasoning steps. OPPR [22] takes the LLM as an optimizer to update a zero-shot CoT prompt iteratively and produce corresponding optimized prompts for a given task. Self-discover [24] selects relevant atomic reasoning modules (e.g. breaking down problems, critical thinking) for a given task, then adapts and customizes those modules to fit the task.
this is a simple question that can be straightforwardly answered correctly under "Don’t think. Just feel.", which is generally regarded as a less favorable prompt, but "Let’s think step by step" guides the LLM to bad reasoning in some steps.
Nevertheless, the severe challenge of choosing one of the suitable prompts for each instance remains: the difficulty of understanding why some reasoning processes succeed while others fail. To meet such a challenge, we intend to detect the mechanism of zero-shot CoT which is an unclear mystery [9, 16, 12, 18]. Neuron saliency score analysis is an important approach for observing the information flow during the model inference [19–21, 32], by which we can observe a click of the dynamic reasoning process in certain steps. After comprehensive investigation across several LLMs and tasks, we find that a successful reasoning procedure tends to satisfy the following conditions: the semantic information of the question should be aggregated to the prompt first, and the reasoning steps gather information from both the original question and the synthesized question-prompt semantic information. Otherwise, it is more likely to be a failure reasoning. Such a saliency score phenomenon is in line with human intuition, as the question is the beginning of reasoning, one needs to understand it first, then solve it following the rules within the prompt while always concerning the question itself.
There are three main components in zero-shot CoT: question q, prompt p, and rationale r, and we need to choose a proper tool to analyze the semantic information interactions among these components. The saliency score is a common practice for analyzing the information flow in In-Context Learning [19, 21], and we intend to adapt it to CoT reasoning to observe the information flow in the zero-shot setting.