Can models express calibrated confidence in long-form text?
Can language models be trained to emit extended passages with confidence statements that actually help readers make accurate probabilistic predictions? This matters because confident hallucinations mislead users into bad decisions.
Confident hallucinations lead users to confidently bad decisions, and existing models can't emit long-form text with calibrated confidence. This work defines linguistic calibration through the lens of decision-making: an LM is linguistically calibrated if its generations enable users to make calibrated probabilistic predictions about the world. That definition yields a clean training framework — an SFT step bootstraps the model to emit long-form text with confidence statements ("I estimate a 30% chance of..."; "I am certain that..."), and an RL step rewards generations that let a user provide calibrated answers to related questions. The calibrated Llama-2-7B is significantly more calibrated than strong finetuned factuality baselines at comparable accuracy, and generalizes under domain shift (scientific, biomedical, held-out biography generation).
The keeper is the decision-theoretic definition: calibration is not a property of token probabilities but of whether a reader ends up calibrated — which makes confidence statements first-class, trainable content rather than a post-hoc number.
This operationalizes, for long-form generation, the metacognitive third path the vault already names. Since Can models express uncertainty instead of just answering?, linguistic calibration is the training method that produces faithful uncertainty at paragraph scale, and it complements Can model confidence work as a reward signal for reasoning? (confidence-as-reward) with a user-decision-grounded reward.
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can models express uncertainty instead of just answering?
Most factuality work expands what models know rather than what they know they know. Can expressing calibrated uncertainty create a third path between confident errors and unhelpful abstention?
linguistic calibration is a training method that yields faithful uncertainty at long-form scale
-
Can model confidence work as a reward signal for reasoning?
Explores whether using a language model's own confidence scores as training rewards can simultaneously improve reasoning accuracy and restore calibration that standard RLHF damages.
sibling calibration-via-RL; this rewards calibrated *user* predictions rather than model confidence
-
Does reasoning fine-tuning make models worse at declining to answer?
When models are trained to reason better, do they lose the ability to say 'I don't know'? This matters for high-stakes applications like medical and legal AI that depend on appropriate uncertainty.
the abstention-erosion problem linguistic calibration's graded confidence statements help avoid
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Linguistic Calibration of Long-Form Generations
- Post-Training Large Language Models via Reinforcement Learning from Self-Feedback
- A Survey of Calibration Process for Black-Box LLMs
- Overconfidence in LLM-as-a-Judge: Diagnosis and Confidence-Driven Solution
- Fine-tuning Language Models for Factuality
- Humans overrely on overconfident language models, across languages
- Beyond Accuracy: The Role of Calibration in Self-Improving Large Language Models
- Deep Research: A Systematic Survey
Original note title
linguistic calibration trains long-form generation with verbal confidence statements that let users make calibrated predictions