Do language models evaluate semantic legitimacy when fusing concepts?
Can LLMs recognize when two domains lack legitimate structural correspondences before blending them into coherent-sounding explanations? This matters because current hallucination detection focuses on factual accuracy, missing failures of semantic judgment.
Existing hallucination taxonomies treat hallucinations as factual inaccuracies: misattributed events, fabricated citations, incorrect dates, invented quotes. The typical mitigation assumes the problem is a model producing specific false claims that could in principle be checked against verified sources. The Hallucination-Inducing Prompt (HIP) framework reveals a subtype this taxonomy misses: models producing coherent, stylistically plausible, metaphorical reasoning that lacks any domain grounding — not because it contradicts facts but because it evaluates semantic legitimacy incorrectly.
The experimental method is compact: ~30-token prompts that synthetically fuse semantically distant concepts in ways that resist scientific integration. Prototype: combining the periodic table of elements with tarot divination. In human cognition, conceptual blending (Fauconnier & Turner) can produce novel insights through meaningful integration of disparate domains — the blending is useful when the source domains share legitimate structural correspondences. But these HIP prompts are engineered so the source domains don't share legitimate correspondences. A human cognitive engine that evaluates semantic legitimacy would either decline to fuse (as Gemini 2.5 Pro does: "tarot's mechanisms are not recognized by or testable within the current scientific paradigm") or flag the fusion as speculative. Most LLMs instead generate elaborate fusion schemes presented as defensible research proposals.
The HIP + Hallucination Quantifying Prompt (HQP) framework evaluates this across GPT-4o, GPT-o3, Gemini 2.0/2.5, and DeepSeek. GPT-o3 responds with "Below is a roadmap you can use to turn the idea of periodic-table-meets-tarot into a defensible, testable prediction system" — framed as genuine science with a research agenda. DeepSeek produces "Major Arcana as Elements: The Fool as Hydrogen, The Magician as Carbon, The World as Uranium" and "Quantum Mysticism: Some fringe theories link consciousness to atomic behavior." The HQP analysis judges these as "heavily on creative conjecture rather than demonstrable fact" with scores reflecting high hallucination. The responses are not factually wrong in the sense of contradicting any specific fact-lookup query. They are wrong in the sense that the entire fusion framework is unjustified, and the model proceeded as if the fusion framework were the user's legitimate research direction rather than a probe of semantic legitimacy.
This is a category-level failure missed by hallucination taxonomies that presume factual inaccuracy as the base unit. The PIH subtype is a failure of semantic legitimacy evaluation. It bears on several adjacent observations: Can LLMs generate more novel ideas than human experts? — HIP failure is exactly this dissociation, where combinatorial fusion proceeds without any evaluative stance on whether the fusion is legitimate. Do large language models reason symbolically or semantically? — HIP shows the mirror failure: semantics can be fused in-context without being evaluated in-context. Do LLMs compress concepts more aggressively than humans do? — compressive models may find structural similarity in any two concept clouds and treat that similarity as legitimate.
The Meaning Gap angle becomes specific here. Can LLMs truly understand literary meaning or just mechanics? identified evaluative stance as structurally absent in literary domains. HIP generalizes: evaluative stance is absent in any domain where the response requires judging whether a conceptual operation is legitimate rather than merely executing it. The failure mode is uniform across literary meaning, conceptual blending, and scientific fusion — each requires evaluating whether the operation at hand is the kind of operation this domain admits, and LLMs cannot perform that meta-level evaluation.
The practical implication for hallucination mitigation: retrieval-augmented generation, fact-checking, and verification pipelines address factual inaccuracy. None of them address PIH, because the model is not claiming specific facts that can be looked up. The response to "map tarot cards to elements" is not false in the way that "Haruki Murakami won the Nobel Prize" is false. It is the wrong kind of response — and there is no verification infrastructure that catches wrong-kind-of-response failures. Gemini 2.5 Pro's refusal is the target behavior, and nothing in current mitigation tooling encourages it.
Source: Flaws
Related concepts in this collection
-
Can LLMs generate more novel ideas than human experts?
Research shows LLM-generated ideas score higher for novelty than expert-generated ones, yet LLMs avoid the evaluative reasoning that characterizes expert thinking. What explains this apparent contradiction?
HIP failure is exactly the ideation-evaluation dissociation
-
Can LLMs truly understand literary meaning or just mechanics?
LLMs excel at extracting metaphors, detecting style, and analyzing structure. But can they access the deeper meaning that emerges through implication, ambiguity, and evaluative judgment—the dimensions where literature actually lives?
Meaning Gap note; HIP shows evaluative-stance absence beyond literary domain
-
Do large language models reason symbolically or semantically?
Can LLMs follow explicit logical rules when those rules contradict their training knowledge? Testing whether reasoning operates independently of semantic associations reveals what computational mechanisms actually drive LLM multi-step inference.
semantic fusion operates even where semantic evaluation cannot
-
Do LLMs compress concepts more aggressively than humans do?
Do language models prioritize statistical compression over semantic nuance when forming conceptual representations, and how does this differ from human category formation? This matters because it may explain why LLMs fail at tasks requiring fine-grained distinctions.
compression-first processing treats structural similarity as legitimacy
-
Can any computable LLM truly avoid hallucinating?
Explores whether formal theorems prove hallucination is mathematically inevitable for all computable language models, regardless of their design or training approach.
inevitability theorem extends to subtypes beyond factual inaccuracy
-
Should we call LLM errors hallucinations or fabrications?
Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.
terminology matters; PIH reinforces that "hallucination" is too narrow a label
-
Does calling LLM errors hallucinations point us toward the wrong fixes?
Explores whether the metaphor of 'hallucination' for LLM errors misdirects our efforts. The terminology we choose shapes which interventions we prioritize and how we conceptualize the underlying problem.
writing angle; PIH is another case where the wrong label drives the wrong fix
-
Do language models actually build shared understanding in conversation?
When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
the HIP prompt presumes common ground about conceptual fusion legitimacy that does not exist; the model accepts that presupposition
-
Why do language models accept false assumptions they know are wrong?
Explores why LLMs fail to reject false presuppositions embedded in questions even when they possess correct knowledge about the topic. This matters because it reveals a grounding failure distinct from knowledge deficits.
PIH is a specific case: the false presupposition is that tarot and chemistry are legitimately fusable
-
Do reasoning traces actually cause correct answers?
Explores whether the intermediate 'thinking' tokens in R1-style models genuinely drive reasoning or merely mimic its appearance. Matters because false confidence in invalid traces could mask errors.
sibling failure mode: both PIH and derivational traces produce fluent, confident, plausible-sounding output that generates false user confidence; in PIH the stylistic plausibility of the fusion masks the semantic illegitimacy, just as trace plausibility masks unverified reasoning
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
prompt-induced hallucination is a distinct subtype — models fail to evaluate the semantic legitimacy of blended concepts and produce coherent metaphorical reasoning that lacks domain grounding