LLM Reasoning and Architecture Language Understanding and Pragmatics

Can any computable LLM truly avoid hallucinating?

Explores whether formal theorems prove hallucination is mathematically inevitable for all computable language models, regardless of their design or training approach.

Note · 2026-02-23 · sourced from Flaws
What do language models actually know?

The empirical observation that LLMs hallucinate has a formal foundation. Using results from learning theory, three theorems establish hallucination as mathematically inevitable:

  1. Theorem 1 — For any enumerable set of LLMs, there exists a computable ground truth function such that all states of all LLMs will hallucinate
  2. Theorem 2 — The hallucination occurs on infinitely many inputs, not just edge cases
  3. Theorem 3 — For any individual computable LLM, hallucination is inevitable; extends to infinitely many inputs

Corollary: LLMs cannot prevent themselves from hallucinating. Internal mechanisms — self-correction, chain-of-thought prompting, self-verification — are provably insufficient.

The proof operates in a formal world where hallucination is defined as inconsistency between a computable LLM and a computable ground truth function. Since this formal world is a subset of the real world, the result applies a fortiori to real LLMs.

This has a precise practical consequence: the entire class of approaches trying to "solve hallucination" by making the model better at self-checking is provably limited. External safeguards — retrieval augmentation, human verification, formal verification systems — are not nice-to-haves but mathematical necessities.

The connection to Should we call LLM errors hallucinations or fabrications? is instructive: the formal inevitability result applies regardless of what we call it. Whether termed hallucination, fabrication, or confabulation, the mathematical constraint is the same. But the fabrication framing better suggests the right response — external grounding rather than internal improvement.

Partial mitigation via entity recognition self-knowledge: While hallucination cannot be eliminated, the discovery that entity recognition is a self-knowledge mechanism in base models that causally steers hallucination suggests a partial internal mitigation path. SAE analysis on Gemma 2 reveals features that the model uses to detect whether it "knows" an entity — and chat finetuning repurposes this mechanism for both hallucination control and refusal decisions. This does not contradict formal inevitability (the mechanism reduces frequency, not eliminates it), but it shows the internal landscape is richer than the formal proof implies. See Do models know what they don't know?.

Strengthened formalization (Comprehensive Hallucination Taxonomy, 2508.01781): A later paper extends the inevitability framework with an orthogonal taxonomy and stronger theorems. The taxonomy organizes hallucinations along two independent axes: intrinsic (contradicting input context) versus extrinsic (inconsistent with training data or reality), and factuality (absolute correctness against verified sources) versus faithfulness (adherence to provided input). These axes cross to produce four categories that existing hallucination mitigation techniques treat unevenly. The paper also strengthens the formal result with three theorems that jointly give:

The corollary is the sharpest practical result. The entire class of "make the model better at self-checking" approaches is provably insufficient. This now applies not just to factual inaccuracy (the traditional hallucination frame) but to newer subtypes like prompt-induced conceptual-blending hallucinations — see Do language models evaluate semantic legitimacy when fusing concepts?, which shows that hallucination extends beyond factual inaccuracy into semantic-legitimacy failure, and that no internal mechanism will eliminate it there either.


Source: Flaws; enriched from Knowledge Graphs

Related concepts in this collection

Concept map
15 direct connections · 143 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

hallucination is formally inevitable for any computable LLM regardless of architecture or training