Hallucinations Undermine Trust; Metacognition is a Way Forward

Paper · arXiv 2605.01428 · Published May 2, 2026

Despite significant strides in factual reliability, errors—often termed hallucinations—remain a major concern for generative AI, especially as LLMs are increasingly expected to be helpful in more complex or nuanced setups. Yet even in the simplest setting—factoid question-answering with clear ground truth — frontier models without external tools continue to hallucinate. We argue that most factuality gains in this domain have come from expanding the model’s knowledge boundary (encoding more facts) rather than improving awareness of that boundary (distinguishing known from unknown). We conjecture that the latter is inherently difficult: models may lack the discriminative power to perfectly separate truths from errors, creating an unavoidable tradeoff between eliminating hallucinations and preserving utility. This tradeoff dissolves under a different framing. If we understand hallucinations as confident errors—incorrect information delivered without appropriate qualification—a third path emerges beyond the answer-or-abstain dichotomy: expressing uncertainty. We propose faithful uncertainty: aligning linguistic uncertainty with intrinsic uncertainty. This is one facet of metacognition—the ability to be aware of one’s own uncertainty and to act on it.

Introduction. Despite significant strides in factual reliability (Cheng et al., 2025; Grattafiori et al., 2024; Tian et al., 2023a; Wei et al., 2024b), errors—often termed “hallucinations”—remain a major concern for generative AI, especially as large language models (LLMs) are increasingly expected to be helpful in more complex or nuanced setups. These factually incorrect generations are often delivered with an authoritative tone, risking undermining user trust and spreading misinformation (Ji et al., 2023; Steyvers et al., 2025; Zhang et al., 2025b). In this paper, we focus on a simple setting where frontier models still hallucinate: factoid questionanswering with clear ground truth (setting aside long-form generation and cases of genuine ambiguity or contested claims). For models without access to external tools, we argue that most factuality gains in this domain have come from expanding the model’s knowledge boundary (encoding more facts) rather than improving awareness of that boundary (distinguishing known from unknown).

Discussion / Conclusion. We have argued that fully eliminating hallucinations faces fundamental challenges due to a discrimination gap, and proposed faithful uncertainty as a complementary objective. This metacognitive awareness becomes increasingly important as LLMs evolve into agentic systems, where it serves as the control layer for robust tool use. Faithful uncertainty connects to broader objectives in AI safety: at its core, it is a form of honesty—requiring models to accurately represent their epistemic state rather than project false confidence. Crucially, uncertainty communication enables appropriate human oversight, inviting users to verify, seek additional sources, or exercise their own judgment when models express doubt. Realizing this vision requires a shift in both model development (as current benchmarks focus on factual accuracy) and user expectations (users that expect LLMs to express their uncertainty, and can interpret that uncertainty appropriately).

Hallucinations Undermine Trust; Metacognition is a Way Forward

Synthesis notes that discuss concepts related to this paper