In-Context Principle Learning from Mistakes

Paper · arXiv 2402.05403 · Published February 8, 2024
Prompts and Prompting

In-context learning (ICL, also known as fewshot prompting) has been the standard method of adapting LLMs to downstream tasks, by learning from a few input-output examples. Nonetheless, all ICL-based approaches only learn from correct input-output pairs. In this paper, we revisit this paradigm, by learning more from the few given input-output examples. We introduce Learning Principles (LEAP): First, we intentionally induce the model to make mistakes on these few examples; then the model itself reflects on these mistakes, and learn explicit taskspecific “principles” from them without any human supervision, which help solve similar problems and avoid common mistakes; finally, we prompt the model to answer unseen test questions using the original few-shot examples and these learned general principles. We evaluate LEAP on a wide range of benchmarks, including multi-hop question answering (Hotpot QA), textual QA (DROP), Big-Bench Hard reasoning, and math problems (GSM8K and MATH); in all these benchmarks, LEAP improves the strongest available LLMs such as GPT-3.5-turbo, GPT-4, GPT-4-turbo and Claude-2.1. For example, LEAP improves over the standard few-shot prompting using GPT-4 by 7.5% in DROP, and by 3.3% in HotpotQA.

Introduction. The rise of large language models (LLMs; Radford et al., 2019; Chowdhery et al., 2022; Zhang et al., 2022; Li et al., 2022; Anil et al., 2023; Touvron et al., 2023a;b) that are too costly to finetune for downstream tasks has led to the growing popularity of in-context learning (ICL), also known as few-shot prompting (Brown et al., 2020; Liu et al., 2023; Wei et al., 2023). In in-context learning, the LLM is provided with a few (e.g., three) input-output task-specific examples in its prompt, along with an unseen test input. Using this emergent ability (Wei et al., 2022b), the LLM is then expected to generate output for the test input. The LLM generates this output by implicitly learning the task from the few given examples, at inference time. ICL was shown to be extremely effective and data-efficient across a variety of tasks and domains (Min et al., 2022a; Alayrac et al., 2022; Liu et al., 2021; Lu et al., 2023), mainly because it allows for downstream task-adaptation without training. Further, ICL enables generalization using only a few annotated examples.

Discussion / Conclusion. In this paper, we introduce Learning Principles (LEAP), a novel approach that allows LLMs to learn more out of given few-shot examples, by intentionally making mistakes on these examples; reflecting on the mistakes; and finally articulating explicit task-specific principles, which helps avoid similar mistakes in the future. LEAP requires exactly the same number of labeled examples as fewshot prompting, and allows improving a variety of strong LLMs (GPT-3.5-turbo, GPT-4, GPT-4-turbo and Gemini Pro) across a broad range of reasoning tasks (DROP, HotpotQA, GSM8K , MATH, and Big-Bench Hard). We believe that LEAP unlocks new possibilities from learning in the traditional concept of few-shot in-context learning, by learning from mistakes, rather than learning from positive examples only.