Can models express calibrated confidence in long-form text?

Can language models be trained to emit extended passages with confidence statements that actually help readers make accurate probabilistic predictions? This matters because confident hallucinations mislead users into bad decisions.

Synthesis note · 2026-06-03 · sourced from Reinforcement Learning

Confident hallucinations lead users to confidently bad decisions, and existing models can't emit long-form text with calibrated confidence. This work defines linguistic calibration through the lens of decision-making: an LM is linguistically calibrated if its generations enable users to make calibrated probabilistic predictions about the world. That definition yields a clean training framework — an SFT step bootstraps the model to emit long-form text with confidence statements ("I estimate a 30% chance of..."; "I am certain that..."), and an RL step rewards generations that let a user provide calibrated answers to related questions. The calibrated Llama-2-7B is significantly more calibrated than strong finetuned factuality baselines at comparable accuracy, and generalizes under domain shift (scientific, biomedical, held-out biography generation).

The keeper is the decision-theoretic definition: calibration is not a property of token probabilities but of whether a reader ends up calibrated — which makes confidence statements first-class, trainable content rather than a post-hoc number.

This operationalizes, for long-form generation, the metacognitive third path the vault already names. Since Can models express uncertainty instead of just answering?, linguistic calibration is the training method that produces faithful uncertainty at paragraph scale, and it complements Can model confidence work as a reward signal for reasoning? (confidence-as-reward) with a user-decision-grounded reward.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

13 direct connections · 128 in 2-hop network ·dense cluster Open in graph ↗

Can models express calibrated confidence in long… Can models express uncertainty instead of just ans… Can model confidence work as a reward signal for r… Does reasoning fine-tuning make models worse at de…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can models express uncertainty instead of just answering? Most factuality work expands what models know rather than what they know they know. Can expressing calibrated uncertainty create a third path between confident errors and unhelpful abstention?
linguistic calibration is a training method that yields faithful uncertainty at long-form scale
Can model confidence work as a reward signal for reasoning? Explores whether using a language model's own confidence scores as training rewards can simultaneously improve reasoning accuracy and restore calibration that standard RLHF damages.
sibling calibration-via-RL; this rewards calibrated *user* predictions rather than model confidence
Does reasoning fine-tuning make models worse at declining to answer? When models are trained to reason better, do they lose the ability to say 'I don't know'? This matters for high-stakes applications like medical and legal AI that depend on appropriate uncertainty.
the abstention-erosion problem linguistic calibration's graded confidence statements help avoid

Can models express calibrated confidence in long-form text?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4