Not All Parameters Are Created Equal: Smart Isolation Boosts Fine-Tuning Performance
Supervised fine-tuning (SFT) is a pivotal approach to adapting large language models (LLMs) for downstream tasks; however, performance often suffers from the “seesaw phenomenon”, where indiscriminate parameter updates yield progress on certain tasks at the expense of others. To address this challenge, we propose a novel Core Parameter Isolation Fine-Tuning (CPI-FT) framework. Specifically, we first independently fine-tune the LLM on each task to identify its core parameter regions by quantifying parameter update magnitudes. Tasks with similar core regions are then grouped based on region overlap, forming clusters for joint modeling. We further introduce a parameter fusion technique: for each task, core parameters from its individually finetuned model are directly transplanted into a unified backbone, while non-core parameters from different tasks are smoothly integrated via Spherical Linear Interpolation (SLERP), mitigating destructive interference. A lightweight, pipelined SFT training phase using mixed-task data is subsequently employed, while freezing core regions from prior tasks to prevent catastrophic forgetting. Extensive experiments on multiple public benchmarks demonstrate that our approach significantly alleviates task interference and forgetting, consistently outperforming vanilla multi-task and multi-stage finetuning baselines.
Supervised fine-tuning (SFT) faces significant challenges in multi-task and multi-domain scenarios. When applied to heterogeneous datasets, such as mathematical reasoning, creative writing, coding, and factual question answering, conflicting optimization objectives among tasks often lead to the "seesaw effect" (Yu et al., 2020), where performance improvements on one task degrade others. This issue hinders the development of robust, broadly capable large language models (LLMs).
We hypothesize that the root cause of these challenges lies in the phenomenon of parameter heterogeneity: distinct capabilities of large language models (LLMs) rely on specific and potentially overlapping subsets of parameters, with certain clusters disproportionately contributing to particular tasks. Uniform updates across the entire parameter space fail to account for the specialized roles of these localized parameter subsets, thereby fostering destructive interference among competing tasks (Chen et al., 2018). Mitigating such interference necessitates a paradigm shift from heuristic approaches or task-level isolation to a principled framework that explicitly models task sensitivities at the parameter level. Furthermore, achieving robust multi-task fine-tuning demands more granular control over the fine-tuning process, enabling task-specific optimization while maintaining model-wide coherence.
Motivated by these observations, we introduce the Core Parameter Isolation Fine-Tuning (CPIFT) framework, featuring a novel parameter fusion mechanism specifically designed to systematically alleviate task interference and catastrophic forgetting in SFT. Our approach involves several key steps. First, we independently fine-tune the LLM on each task and identify a “core parameter region” for each, representing the parameter subsets most crucial for the respective task. Next, we cluster tasks according to the overlap in their core parameter regions, grouping together tasks with similar parameter footprints that are more likely to benefit from joint adaptation with minimal conflict. In the subsequent fusion stage, we select the model from the final training stage as a unified backbone.
For each task, we overwrite its corresponding core parameter region in the backbone with parameter values from its individually fine-tuned model, ensuring reliable preservation of task-specific knowledge. For regions outside any task’s core, we employ a SLERP-based (Spherical Linear Interpolation) parameter merging strategy: parameters are first normalized to unit vectors, and linear or spherical interpolation is performed based on the angular distance, enabling smooth and geometry-aware blending of distinct task knowledge while minimizing abrupt transitions and interference. Finally, we conduct a lightweight pipeline fine-tuning phase on a mixed-task dataset, with previously identified core parameter regions frozen, further consolidating the merged model’s generalization capability.
Superiority of CPI-FT over Standard SFT Approaches The full multi-task supervised finetuning (Full SFT) baseline—where all model parameters are updated uniformly across tasks without isolation—consistently achieves the lowest performance across all tasks and model configurations. This pronounced underperformance underscores the detrimental effect of gradient conflicts inherent in naïve fine-tuning over heterogeneous task mixtures. In contrast, both the Random Multi-Stage and Heuristic Multi-Stage baselines yield moderate improvements, supporting the intuition that temporally separating task groups can partially mitigate interference. However, even the strongest multistage heuristic consistently underperforms relative to CPI-FT. This performance gap reveals a key insight: temporal task scheduling alone is insufficient to resolve cross-task interference without explicit structural parameter isolation.