Can models express uncertainty instead of just answering?
Most factuality work expands what models know rather than what they know they know. Can expressing calibrated uncertainty create a third path between confident errors and unhelpful abstention?
Even on the simplest setting — factoid QA with clear ground truth and no external tools — frontier models still hallucinate. The paper's diagnosis is that most factuality gains have come from expanding the model's knowledge boundary (encoding more facts) rather than improving awareness of that boundary (distinguishing known from unknown). It conjectures the latter is inherently hard: models may lack the discriminative power to perfectly separate truths from errors, creating an unavoidable tradeoff between eliminating hallucination and preserving utility.
That tradeoff dissolves under a reframing. If hallucination is understood as confident error — incorrect information delivered without appropriate qualification — then a third path opens beyond the answer-or-abstain dichotomy: expressing uncertainty. The proposal is faithful uncertainty: aligning the model's linguistic uncertainty with its intrinsic uncertainty. This is one facet of metacognition — being aware of one's own uncertainty and acting on it.
The framing's reach is what makes it post-worthy. Faithful uncertainty becomes the control layer for robust agentic tool use, and it is fundamentally a form of honesty — accurately representing one's epistemic state rather than projecting false confidence — which connects it to AI safety. It also enables appropriate human oversight: a model that expresses calibrated doubt invites users to verify and exercise judgment. Realizing it requires shifts on both sides — benchmarks that reward calibrated uncertainty rather than only accuracy, and users who expect and can interpret it. This complicates Does reasoning fine-tuning make models worse at declining to answer?: faithful uncertainty is the richer target that pure abstention only crudely approximates.
Inquiring lines that use this note as a source 4
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Does reasoning fine-tuning make models worse at declining to answer?
When models are trained to reason better, do they lose the ability to say 'I don't know'? This matters for high-stakes applications like medical and legal AI that depend on appropriate uncertainty.
faithful uncertainty is the graded target that abstention approximates; reasoning training erodes both
-
Can a model be truthful without actually being honest?
Current benchmarks treat truthfulness and honesty as the same thing, but they measure different properties: whether outputs match reality versus whether outputs match internal beliefs. What happens if they diverge?
faithful uncertainty is an operational form of honesty distinct from being correct
-
Can LLM explanations actually help humans predict model behavior?
Do model explanations enable users to accurately simulate how the model will behave on related inputs? This matters because it determines whether explanations genuinely improve human understanding or just create an illusion of understanding.
both warn that expressed signals (explanations, confidence) can diverge from internal state unless explicitly aligned
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- AbstentionBench: Reasoning LLMs Fail on Unanswerable Questions
- Hallucinations Undermine Trust; Metacognition is a Way Forward
- TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning
- Humans overrely on overconfident language models, across languages
- Can LLMs Ground when they (Don't) Know: A Study on Direct and Loaded Political Questions
- Linguistic Calibration of Long-Form Generations
- Representation Engineering: A Top-Down Approach to AI Transparency
- Deep Research: A Systematic Survey
Original note title
faithful uncertainty dissolves the answer-or-abstain dilemma by aligning expressed uncertainty with intrinsic uncertainty — a metacognitive third path