Exploring Format Consistency for Instruction Tuning
“As outlined in Iyer et al. (2022), existing instruction formats exhibit variations across different datasets, which can be classified into three distinct hierarchical levels: Task-level format, Instance level format, and Keywords-level format (as illustrated in Figure 2). We present an overview of existing instruction tuning datasets based on instruction formats in Table 1.
Task-level Format encompasses a comprehensive definition of a task and may include supplementary information such as positive or negative examples and explanations of the examples. Representative datasets are Ni-v2 (Wang et al., 2022b), Unnatural Instructions (Honovich et al., 2022a), and Alpaca (Taori et al., 2023).
Instance-level Format employs succinct templates that are customized for each individual example and is occasionally structured in a cloze-style format to elicit the intended output. Representative datasets are Flan (Wei et al., 2021) and PromptSource (Bach et al., 2022).
Keywords-level Format closely resembles the instance-level format, but it limits the instruction templates exclusively to keywords. CrossFit (Ye et al., 2021a) serves as a representative example of a keywords-level dataset.”