Instruction Induction: From Few Examples to Natural Language Task Descriptions

Paper · arXiv 2205.10782 · Published May 22, 2022
Prompts Prompting

The Instruction Paradigm Efrat and Levy [2020] propose to learn new tasks from natural language instructions. Mishra et al. [2022] and Wang et al. [2022b] collect crowdsourcing instructions used to create NLP datasets into a benchmark for measuring the ability to solve tasks by reading instructions. Recent work shows that fine-tuning on task instructions (instruction tuning) improves the zero-shot learning abilities of LMs [Sanh et al., 2022, Wei et al., 2022a, Ouyang et al., 2022]. This work focuses on models’ ability to generate instructions, rather than their ability to execute instructions written by humans.

Intermediate Reasoning Steps Nye et al. [2022] show that LMs can perform complex computations by writing intermediate steps on a “scratchpad”. In chain of thought prompting [Wei et al., 2022b], input-output demonstrations are enriched with sentences elaborating intermediate task reasoning steps, improving the performance of LMs on tasks requiring reasoning skills. Subsequent work further improves the performance on such tasks using a self-consistency ensemble [Wang et al., 2022a], which samples a set of diverse chain-of-thought reasoning paths, taking the majority vote over all generated answers. Zelikman et al. [2022] utilize a small set of examples labeled with chain-of-thought rationales and a large set of unlabeled data to iteratively bootstrap automatic rationale generation, thus creating a large dataset labeled with such rationales to enable fine-tuning. In contrast, we study the ability of LMs to generate a description of the task, rather than generating intermediate reasoning steps as a means of executing complex tasks.

8 Discussion

This work demonstrates that large LMs can not only infer new tasks based on a handful of demonstrations, but also describe them in natural language. We provide evidence of this ability on a diverse set of language tasks, and show that while instruction induction abilities are limited to a single state-of-the-art model, this model does indeed approach human performance on about half the tasks. It is not unreasonable to assume that models in the near future will be even better at processing human-generated instructions, and it is therefore interesting to discuss the potential applications of instruction induction. In particular, we envision a use case in which instruction induction serves as a machine learning approach; instead of converting a dataset into a set of continuous parameters, we could produce a natural language instruction that best describes the data. Grounding the model in concise natural language has the advantage of interpretability, and has the potential to solve fundamental issues pertaining to spurious correlations. While it is still too early to determine whether this approach is viable, we view it as an intriguing direction for future research.

Instruction Induction

We begin by formulating the task of instruction induction. Given a sequence of n demonstrations fxk; ykgk2f1;:::;ng, the goal is to generate a single natural language instruction, such that for each xk, following the instruction results in yk. This format is similar to in-context learning [Brown et al., 2020], only here the desired output is an instruction describing the relation between the inputs and outputs of the demonstrations. We require models to perform this in a zero-shot setting, without any fine-tuning on labeled data. Figure 1 illustrates the difference between standard in-context prompting and instruction-induction prompting.

To elicit models to generate instructions, we consider prompts that would elicit humans to do so. We design a meta-prompt presenting instruction induction as a challenge puzzle and verify its clarity in a human study (§3.3). The prompt is presented in Figure 1 (right side, in pink).2 While prior work already shows that large LMs are often able to infer a latent task from a given set of demonstrations, this has been largely based on their ability to execute the task on a held-out example. Instruction induction requires that the model describe the underlying task in natural language.