Can any computable LLM truly avoid hallucinating?
Explores whether formal theorems prove hallucination is mathematically inevitable for all computable language models, regardless of their design or training approach.
The empirical observation that LLMs hallucinate has a formal foundation. Using results from learning theory, three theorems establish hallucination as mathematically inevitable:
- Theorem 1 — For any enumerable set of LLMs, there exists a computable ground truth function such that all states of all LLMs will hallucinate
- Theorem 2 — The hallucination occurs on infinitely many inputs, not just edge cases
- Theorem 3 — For any individual computable LLM, hallucination is inevitable; extends to infinitely many inputs
Corollary: LLMs cannot prevent themselves from hallucinating. Internal mechanisms — self-correction, chain-of-thought prompting, self-verification — are provably insufficient.
The proof operates in a formal world where hallucination is defined as inconsistency between a computable LLM and a computable ground truth function. Since this formal world is a subset of the real world, the result applies a fortiori to real LLMs.
This has a precise practical consequence: the entire class of approaches trying to "solve hallucination" by making the model better at self-checking is provably limited. External safeguards — retrieval augmentation, human verification, formal verification systems — are not nice-to-haves but mathematical necessities.
The connection to Should we call LLM errors hallucinations or fabrications? is instructive: the formal inevitability result applies regardless of what we call it. Whether termed hallucination, fabrication, or confabulation, the mathematical constraint is the same. But the fabrication framing better suggests the right response — external grounding rather than internal improvement.
Partial mitigation via entity recognition self-knowledge: While hallucination cannot be eliminated, the discovery that entity recognition is a self-knowledge mechanism in base models that causally steers hallucination suggests a partial internal mitigation path. SAE analysis on Gemma 2 reveals features that the model uses to detect whether it "knows" an entity — and chat finetuning repurposes this mechanism for both hallucination control and refusal decisions. This does not contradict formal inevitability (the mechanism reduces frequency, not eliminates it), but it shows the internal landscape is richer than the formal proof implies. See Do models know what they don't know?.
Strengthened formalization (Comprehensive Hallucination Taxonomy, 2508.01781): A later paper extends the inevitability framework with an orthogonal taxonomy and stronger theorems. The taxonomy organizes hallucinations along two independent axes: intrinsic (contradicting input context) versus extrinsic (inconsistent with training data or reality), and factuality (absolute correctness against verified sources) versus faithfulness (adherence to provided input). These axes cross to produce four categories that existing hallucination mitigation techniques treat unevenly. The paper also strengthens the formal result with three theorems that jointly give:
- Theorem 1: For any computably enumerable set of LLMs, there exists a computable ground truth function f such that all states of all LLMs in that set will hallucinate (all currently proposed polynomial-time bounded LLMs are inherently prone).
- Theorem 2: The same holds on infinitely many inputs — hallucinations are not isolated incidents.
- Theorem 3: Generalizes inevitability to any individual computable LLM, confirming current and future LLMs will always exhibit some form.
- Corollary 1: All computable LLMs cannot prevent themselves from hallucinating — LLMs cannot solely rely on internal mechanisms (self-correction, CoT prompting); external safeguards are essential.
The corollary is the sharpest practical result. The entire class of "make the model better at self-checking" approaches is provably insufficient. This now applies not just to factual inaccuracy (the traditional hallucination frame) but to newer subtypes like prompt-induced conceptual-blending hallucinations — see Do language models evaluate semantic legitimacy when fusing concepts?, which shows that hallucination extends beyond factual inaccuracy into semantic-legitimacy failure, and that no internal mechanism will eliminate it there either.
Source: Flaws; enriched from Knowledge Graphs
Related concepts in this collection
-
Should we call LLM errors hallucinations or fabrications?
Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.
terminology shapes intervention strategy; this formal result confirms that internal-only fixes are provably insufficient
-
What limits how much models can improve themselves?
Explores whether self-improvement has fundamental boundaries set by how well models can verify versus generate solutions, and what this means across different task types.
another formal bound on self-correction; hallucination inevitability is the limit case
-
Can we detect when language models confabulate?
Current uncertainty metrics fail to catch inconsistent outputs that look confident. Could measuring semantic divergence across samples reveal confabulation signals that token-level metrics miss?
detection is possible even when elimination is not; shifts the goal from prevention to identification
-
Do models know what they don't know?
Can language models develop internal representations that track their own knowledge boundaries? This matters because understanding self-knowledge mechanisms could explain how models choose between hallucination and refusal.
partial internal mitigation; entity self-knowledge reduces hallucination frequency without contradicting formal inevitability
-
Do language models evaluate semantic legitimacy when fusing concepts?
Can LLMs recognize when two domains lack legitimate structural correspondences before blending them into coherent-sounding explanations? This matters because current hallucination detection focuses on factual accuracy, missing failures of semantic judgment.
subtype beyond factual inaccuracy; inevitability theorem extends here and current mitigation tooling does not
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
hallucination is formally inevitable for any computable LLM regardless of architecture or training