Critique Fine-Tuning: Learning to Critique is More Effective than Learning to Imitate

Paper · arXiv 2501.17703 · Published January 29, 2025

Supervised Fine-Tuning (SFT) is commonly used to train language models to imitate annotated responses for given instructions. In this paper, we challenge this paradigm and propose Critique Fine-Tuning (CFT), a strategy where models learn to critique noisy responses rather than simply imitate correct ones. Inspired by human learning processes that emphasize critical thinking, CFT encourages deeper analysis and nuanced understanding—traits often overlooked by standard SFT.

In the process of SFT, LLMs are forced to imitate the annotated responses. Numerous efforts have been made to build high-quality SFT datasets using approaches like Self-Instruct (Wang et al., 2023b) and Evol-Instruct (Xu et al., 2024) to enhance LLMs’ general instruction-following capabilities.

we challenge the prevailing paradigm of SFT and propose a new learning framework called Critique Fine-Tuning (CFT). Inspired by human learning—where critical thinking and constructive feedback are vital for improvement— we shift the focus from simple imitation to critiquebased learning. When humans learn, they do not merely replicate provided answers but analyze, critique, and refine them. Similarly, in CFT, the model learns to provide critiques for noisy responses, identify flaws, suggest improvements, and verify correctness. Formally, CFT involves training the model to critique a given query-response pair, maximizing the likelihood P(c|[x; y]), where c is the annotated critique for a query-response pair [x; y]. A detailed visualization of CFT is presented in Figure 1.

Through these experiments, we demonstrated CFT’s efficiency and effectiveness over SFT. However, our approach has limitations. Firstly, the critique dataset was entirely synthesized by GPT-4o, with at least 20% of critiques containing errors. Improving the critique dataset quality could further enhance performance. Secondly, CFT-trained models currently lack the ability to perform self-critique, so we have not observed self-improvement effects.

In this paper, we introduced Critique Fine-Tuning (CFT), a novel paradigm that fundamentally reimagines how language models learn from instruction data. Unlike traditional Supervised Fine-Tuning (SFT) that focuses on response imitation, CFT emphasizes critical thinking by teaching models to critique and analyze responses

Critique Model The critique model is different from selfcorrection, where a specialized model is being used to provide feedback to an existing model to assist the generation process. Reward models are the most popular critique models used in mathematical reasoning. Recently, different outcome reward models (Uesato et al., 2022; Yang et al., 2024b) and process reward models (Wang et al., 2024a; Lightman et al., 2023a; Yuan et al., 2024) have been explored to enhance LLMs’ reasoning capabilities. However, these critique models are mostly designed for directly estimating the reward score without intermediate reasoning. The closest to ours is critique-out-loud (Ankner et al.), which only serves as a reward model instead of an actor. Our paper is very different from these two notions. We use ‘critique’ simply as a learning objective to push the model to gain a deeper understanding of the problem. During inference time, the trained model simply generates the response without involving any critique or refinement process.