Reinforcement Learning for LLMs LLM Reasoning and Architecture Language Understanding and Pragmatics

Does model confidence predict robustness to prompt changes?

Explores whether a model's certainty about its answer determines how much it resists prompt rephrasing and semantic variation. This matters because it could explain why some tasks are harder to evaluate reliably.

Note · 2026-03-28 · sourced from Prompts Prompting
How should we allocate compute budget at inference time? How do reasoning models actually fail under pressure?

ProSA (2024) provides the first systematic study of prompt sensitivity across multiple tasks and models, revealing that sensitivity is not random variation but a predictable function of model confidence.

The core finding: when a model is highly confident in its output, it is robust to prompt rephrasing, reordering, and semantic variation. When confidence is low, minor prompt changes cause significant output swings. This means prompt sensitivity is not a property of the prompt alone — it is a joint property of the prompt and the model's certainty about the underlying task.

Three moderating factors: (1) larger models exhibit enhanced robustness, consistent with the general trend that scale improves calibration; (2) few-shot examples alleviate sensitivity, providing concrete anchoring that reduces the model's reliance on prompt surface form; (3) subjective evaluations are particularly susceptible to prompt sensitivities, especially in complex reasoning-oriented tasks where the model's confidence is naturally lower.

This connects to Can models learn to ignore irrelevant prompt changes? — BCT/ACT train invariance by exposing models to perturbed prompts and requiring consistent outputs. The ProSA finding explains WHY this works: consistency training pushes models toward high-confidence response regions where robustness is natural, rather than teaching robustness as a separate skill.

The finding also has implications for Why do chain-of-thought examples fail across different conditions?: exemplar brittleness may be most severe on tasks where the model's confidence is borderline. On high-confidence tasks, exemplar ordering may matter less because the model "knows the answer" regardless.

For evaluation design: prompt sensitivity as a confidence signal means that benchmark results on single prompt formulations may be misleading exactly where they matter most — on difficult tasks where model confidence is low and prompt variation would produce the largest swings.


Source: Prompts Prompting

Related concepts in this collection

Concept map
13 direct connections · 130 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

prompt sensitivity is a reflection of model confidence — higher confidence correlates with increased robustness against prompt semantic variations