Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing

Paper · arXiv 2107.13586 · Published July 28, 2021

This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub “prompt-based learning”. Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(yjx), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x0 that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string ^x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g. the choice of pre-trained models, prompts, and tuning strategies.

Two Sea Changes in NLP

A Formal Description of Prompting

Supervised Learning in NLP . . . . . .

Prompting Basics . . . . . . . . . . . .

. Prompt Addition . . . . . . . .

. Answer Search . . . . . . . . .

. Answer Mapping . . . . . . . .

Design Considerations for Prompting .

Pre-trained Language Models

Training Objectives . . . . . . . . . . .

Noising Functions . . . . . . . . . . . .

Directionality of Representations . . . .

Typical Pre-training Methods . . . . . .

. Left-to-Right Language Model .

. Masked Language Models . . .

. Prefix and Encoder-Decoder . .

Prompt Engineering

Prompt Shape . . . . . . . . . . . . . .

Manual Template Engineering . . . . .

Automated Template Learning . . . . .

. Discrete Prompts . . . . . . . .

. Continuous Prompts . . . . . .

Answer Engineering

Answer Shape . . . . . . . . . . . . . .

Answer Space Design Methods . . . . .

. Manual Design . . . . . . . . .

. Discrete Answer Search . . . .

. Continuous Answer Search . . .

Multi-Prompt Learning

Prompt Ensembling . . . . . . . . . . .

Prompt Augmentation . . . . . . . . . .

Prompt Composition . . . . . . . . . .

Prompt Decomposition . . . . . . . . .

Training Strategies for Prompting Methods

Training Settings . . . . . . . . . . . .

Parameter Update Methods . . . . . . .

. Promptless Fine-tuning . . . . .

. Tuning-free Prompting . . . . .

. Fixed-LM Prompt Tuning . . .

. Fixed-prompt LM Tuning . . .

. Prompt+LM Tuning . . . . . .

Applications

Knowledge Probing . . . . . . . . . . .

Classification-based Tasks . . . . . . .

Information Extraction . . . . . . . . .

“Reasoning” in NLP . . . . . . . . . .

Question Answering . . . . . . . . . .

Text Generation . . . . . . . . . . . . .

Automatic Evaluation of Text Generation

Multi-modal Learning . . . . . . . . . .

Meta-Applications . . . . . . . . . . .

Resources . . . . . . . . . . . . . . . .

Prompt-relevant Topics

Challenges

Prompt Design . . . . . . . . . . . . .

Answer Engineering . . . . . . . . . .

Selection of Tuning Strategy . . . . . .

Multiple Prompt Learning . . . . . . .

Selection of Pre-trained Models . . . .

Theoretical and Empirical Analysis o

rompting . . . . . . . . . . . . . . . .

Transferability of Prompts . . . . . . .

Combination of Different Paradigms . .

Calibration of Prompting Methods . . .

Meta Analysis

Timeline . . . . . . . . . . . . . . . . .

Trend Analysis . . . . . . . . . . . . .

Conclusion

Appendix on Pre-trained LMs

. Evolution of Pre-trained LM Parameters

. Auxiliary Objective . . . . . . . . . . .

. Pre-trained Language Mode

amilies . . . . . . . . . . . . . . . . .