Do Models Really Learn to Follow Instructions? An Empirical Study of Instruction Tuning

Paper · arXiv 2305.11383 · Published May 19, 2023
Training Fine Tuning

Despite impressive performance gains, what models learn from IT remains understudied. In this work, we analyze how models utilize instructions during IT by comparing model training with altered vs. original instructions. Specifically, we create simplified task definitions by removing all semantic components and only leaving the output space information, and delusive examples that contain incorrect input-output mapping. Our experiments show that models trained on simplified task definition or delusive examples can achieve comparable performance to the ones trained on the original instructions and examples. Furthermore, we introduce a random baseline to perform zero shot classification tasks, and find it achieves similar performance (42.6% exact-match) as IT does (43% exact-match)