ProSA: Assessing and Understanding the Prompt Sensitivity of LLMs
Our extensive study, spanning multiple tasks, uncovers that prompt sensitivity fluctuates across datasets and models, with larger models exhibiting enhanced robustness. We observe that few-shot examples can alleviate this sensitivity issue, and subjective evaluations are also susceptible to prompt sensitivities, particularly in complex, reasoning-oriented tasks.
Findings suggest that prompt sensitivity is essentially a reflection of the model’s confidence level: higher confidence in its outputs correlates with increased robustness against prompt semantic variations.