Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
This paper surveys and organizes research works in a new paradigm in natural language processing, which we dub “prompt-based learning”. Unlike traditional supervised learning, which trains a model to take in an input x and predict an output y as P(yjx), prompt-based learning is based on language models that model the probability of text directly. To use these models to perform prediction tasks, the original input x is modified using a template into a textual string prompt x0 that has some unfilled slots, and then the language model is used to probabilistically fill the unfilled information to obtain a final string ^x, from which the final output y can be derived. This framework is powerful and attractive for a number of reasons: it allows the language model to be pre-trained on massive amounts of raw text, and by defining a new prompting function the model is able to perform few-shot or even zero-shot learning, adapting to new scenarios with few or no labeled data. In this paper we introduce the basics of this promising paradigm, describe a unified set of mathematical notations that can cover a wide variety of existing work, and organize existing work along several dimensions, e.g. the choice of pre-trained models, prompts, and tuning strategies.
__
Two Sea Changes in NLP
A Formal Description of Prompting
Supervised Learning in NLP . . . . . .
Prompting Basics . . . . . . . . . . . .
. Prompt Addition . . . . . . . .
. Answer Search . . . . . . . . .
. Answer Mapping . . . . . . . .
Design Considerations for Prompting .
Pre-trained Language Models
Training Objectives . . . . . . . . . . .
Noising Functions . . . . . . . . . . . .
Directionality of Representations . . . .
Typical Pre-training Methods . . . . . .
. Left-to-Right Language Model .
. Masked Language Models . . .
. Prefix and Encoder-Decoder . .
Prompt Engineering
Prompt Shape . . . . . . . . . . . . . .
Manual Template Engineering . . . . .
Automated Template Learning . . . . .
. Discrete Prompts . . . . . . . .
. Continuous Prompts . . . . . .
Answer Engineering
Answer Shape . . . . . . . . . . . . . .
Answer Space Design Methods . . . . .
. Manual Design . . . . . . . . .
. Discrete Answer Search . . . .
. Continuous Answer Search . . .
Multi-Prompt Learning
Prompt Ensembling . . . . . . . . . . .
Prompt Augmentation . . . . . . . . . .
Prompt Composition . . . . . . . . . .
Prompt Decomposition . . . . . . . . .
Training Strategies for Prompting Methods
Training Settings . . . . . . . . . . . .
Parameter Update Methods . . . . . . .
. Promptless Fine-tuning . . . . .
. Tuning-free Prompting . . . . .
. Fixed-LM Prompt Tuning . . .
. Fixed-prompt LM Tuning . . .
. Prompt+LM Tuning . . . . . .
Applications
Knowledge Probing . . . . . . . . . . .
Classification-based Tasks . . . . . . .
Information Extraction . . . . . . . . .
“Reasoning” in NLP . . . . . . . . . .
Question Answering . . . . . . . . . .
Text Generation . . . . . . . . . . . . .
Automatic Evaluation of Text Generation
Multi-modal Learning . . . . . . . . . .
Meta-Applications . . . . . . . . . . .
Resources . . . . . . . . . . . . . . . .
Prompt-relevant Topics
Challenges
Prompt Design . . . . . . . . . . . . .
Answer Engineering . . . . . . . . . .
Selection of Tuning Strategy . . . . . .
Multiple Prompt Learning . . . . . . .
Selection of Pre-trained Models . . . .
Theoretical and Empirical Analysis o
rompting . . . . . . . . . . . . . . . .
Transferability of Prompts . . . . . . .
Combination of Different Paradigms . .
Calibration of Prompting Methods . . .
Meta Analysis
Timeline . . . . . . . . . . . . . . . . .
Trend Analysis . . . . . . . . . . . . .
Conclusion
Appendix on Pre-trained LMs
. Evolution of Pre-trained LM Parameters
. Auxiliary Objective . . . . . . . . . . .
. Pre-trained Language Mode
amilies . . . . . . . . . . . . . . . . .