A Survey on Prompt Tuning

Paper · arXiv 2507.06085 · Published July 8, 2025

Prompt tuning has emerged as a promising parameter-efficient fine-tuning (PEFT) approach that offers several advantages: (1) parameter efficiency through updating only a small group of continuous vectors while keeping the pretrained language model frozen; (2) modular adaptation through task-specific prompts that preserve the original model and parameters, enabling efficient deployment; (3) framework flexibility supporting various knowledge transfer and composition mechanisms, facilitating multi-task learning and domain adaptation. (Lester et al., 2021)

The growing significance of prompt tuning is evidenced by its dual influence: for the academic community, it intersects with various areas including model adaptation, knowledge transfer (Hinton et al., 2015), continual learning (Wang et al., 2024), and model interpretability (Wang et al., 2024); for commercial applications, it enables economical model customization, reduces deployment overhead, and accelerates application development (Li et al., 2025b; Li & Cole, 2025). This widespread adoption across both domains necessitates a survey to understand its underlying mechanisms, framework designs, innovations, and development trends.

We present a comprehensive categorization of prompt tuning approaches, categorizing existing methods into two main branches as shown in Figure 1: direct prompt learning and transfer learning. Direct prompt learning encompasses methods that perform single-stage training on target tasks. Transfer learning methods, on the other hand, focus on leveraging knowledge from source tasks to improve performance on target tasks.

Transfer learning in prompt tuning leverages knowledge from source tasks to improve performance on target tasks (Gu et al., 2022).

This framework comprises three essential components: (1) knowledge transfer strategies that address how to extract and encode knowledge from source tasks; (2) task adaptation mechanisms that determine how to utilize transferred knowledge for target tasks; (3) cross-domain methods that bridge gaps between source and target domains, such as handling different label spaces. The effectiveness of transfer learning in prompt tuning depends on several variables: task similarity between source and target domains, quality of source task data, and the robustness of the transfer mechanism. Studies have proved that welldesigned transfer approaches can improve prompt tuning performance, particularly in few-shot scenarios where target task data is limited.

For a new target task, it measures the similarity between this task and source tasks using the early checkpoints of the prompts for source tasks, selects the most similar source task, and initializes the target prompt using the final checkpoint of that task, followed by continued prompt tuning on the target task. Several key insights emerge from SPoT: (1) prompt transfer can effectively improve performance; (2) task similarity strongly correlates with transfer effectiveness, with high-quality source tasks being either large-scale datasets, complex reasoning tasks, or tasks similar to the target; (3) early checkpoints of prompts serve as better task embeddings than final checkpoints for measuring task similarity, suggesting the importance of capturing task-specific knowledge in the early training stage. While SPOT matches or outperforms full model fine-tuning across all model sizes while only tuning a small number of parameters, it exhibits certain limitations: high dependence to source task selection, incomplete capture of transfer characteristics through task embeddings, and additional computational overhead for computing task embeddings.

XPrompt is an efficient prompt tuning method that utilizes pruning to soft prompt tokens (Ma et al., 2022). This approach consists of three steps: (1) vanilla prompt tuning on the target tasks; (2) hierarchical pruning, which first removes negative soft prompt tokens at token-level and then uses fine-grained pruning at piece-level; (3) weight rewinding to retrain the identified positive soft prompt tokens.

The key insights of P-Tuning v2 include: (1) prompt depth influences model performance and adding soft prompts to deeper layers can match the performance of full-layer prompting; (2) optimal prompt length varies by task complexity with simple classification tasks preferring shorter soft prompts while sequence labeling tasks benefit from longer ones; (3) reparameterization influences are task-dependent rather than universally beneficial.

Several key insights emerge from the investigation: (1) emotional knowledge is inherently transferable across conversation datasets despite varying label nomenclatures; (2) performance improvement exhibits a linear correlation with the number of source tasks, indicating scalable knowledge transfer; (3) the intrinsic dimensionality of prompt parameters is substantially lower than their apparent dimensions. While CTPT demonstrates computational efficiency and robust zero-shot transfer capabilities, it exhibits certain limitations: requiring multiple forward passes for optimization, slower convergence compared to gradient-based methods, and decreased performance when source and target tasks differ significantly.

Challenges and Future Work

5.1. Current Challenges

Computational efficiency. The addition of soft prompts extends input sequence length, increasing memory consumption and computational costs. This overhead becomes substantial when processing long sequences or storing multiple task-specific prompts (Tay et al., 2022).

Training instability. Prompt tuning demonstrates high dependence to hyperparameter choices, particularly learning rates, leading to convergence difficulties and inconsistent performance across different initializations (Gu et al., 2022). This necessitates hyperparameter tuning to achieve stable and optimal results.

Prompt initialization. Initialization strategies for soft prompts, whether random or embedding-based, influence model performance and convergence, yet current approaches can yield suboptimal results (Liu et al., 2022a).

Model scale dependency. The effectiveness of prompt tuning correlates with model scale, performing well on large models but decreasing on smaller ones, limiting its applicability in resource-constrained settings (Lester et al., 2021).

Explainability. The semantic meaning of learned prompts and their interaction mechanisms with pretrained models remain poorly understood, limiting their improvements.