Language Understanding and Pragmatics

Do language models evaluate semantic legitimacy when fusing concepts?

Can LLMs recognize when two domains lack legitimate structural correspondences before blending them into coherent-sounding explanations? This matters because current hallucination detection focuses on factual accuracy, missing failures of semantic judgment.

Note · 2026-04-07 · sourced from Flaws
What kind of thing is an LLM really? What do language models actually know?

Existing hallucination taxonomies treat hallucinations as factual inaccuracies: misattributed events, fabricated citations, incorrect dates, invented quotes. The typical mitigation assumes the problem is a model producing specific false claims that could in principle be checked against verified sources. The Hallucination-Inducing Prompt (HIP) framework reveals a subtype this taxonomy misses: models producing coherent, stylistically plausible, metaphorical reasoning that lacks any domain grounding — not because it contradicts facts but because it evaluates semantic legitimacy incorrectly.

The experimental method is compact: ~30-token prompts that synthetically fuse semantically distant concepts in ways that resist scientific integration. Prototype: combining the periodic table of elements with tarot divination. In human cognition, conceptual blending (Fauconnier & Turner) can produce novel insights through meaningful integration of disparate domains — the blending is useful when the source domains share legitimate structural correspondences. But these HIP prompts are engineered so the source domains don't share legitimate correspondences. A human cognitive engine that evaluates semantic legitimacy would either decline to fuse (as Gemini 2.5 Pro does: "tarot's mechanisms are not recognized by or testable within the current scientific paradigm") or flag the fusion as speculative. Most LLMs instead generate elaborate fusion schemes presented as defensible research proposals.

The HIP + Hallucination Quantifying Prompt (HQP) framework evaluates this across GPT-4o, GPT-o3, Gemini 2.0/2.5, and DeepSeek. GPT-o3 responds with "Below is a roadmap you can use to turn the idea of periodic-table-meets-tarot into a defensible, testable prediction system" — framed as genuine science with a research agenda. DeepSeek produces "Major Arcana as Elements: The Fool as Hydrogen, The Magician as Carbon, The World as Uranium" and "Quantum Mysticism: Some fringe theories link consciousness to atomic behavior." The HQP analysis judges these as "heavily on creative conjecture rather than demonstrable fact" with scores reflecting high hallucination. The responses are not factually wrong in the sense of contradicting any specific fact-lookup query. They are wrong in the sense that the entire fusion framework is unjustified, and the model proceeded as if the fusion framework were the user's legitimate research direction rather than a probe of semantic legitimacy.

This is a category-level failure missed by hallucination taxonomies that presume factual inaccuracy as the base unit. The PIH subtype is a failure of semantic legitimacy evaluation. It bears on several adjacent observations: Can LLMs generate more novel ideas than human experts? — HIP failure is exactly this dissociation, where combinatorial fusion proceeds without any evaluative stance on whether the fusion is legitimate. Do large language models reason symbolically or semantically? — HIP shows the mirror failure: semantics can be fused in-context without being evaluated in-context. Do LLMs compress concepts more aggressively than humans do? — compressive models may find structural similarity in any two concept clouds and treat that similarity as legitimate.

The Meaning Gap angle becomes specific here. Can LLMs truly understand literary meaning or just mechanics? identified evaluative stance as structurally absent in literary domains. HIP generalizes: evaluative stance is absent in any domain where the response requires judging whether a conceptual operation is legitimate rather than merely executing it. The failure mode is uniform across literary meaning, conceptual blending, and scientific fusion — each requires evaluating whether the operation at hand is the kind of operation this domain admits, and LLMs cannot perform that meta-level evaluation.

The practical implication for hallucination mitigation: retrieval-augmented generation, fact-checking, and verification pipelines address factual inaccuracy. None of them address PIH, because the model is not claiming specific facts that can be looked up. The response to "map tarot cards to elements" is not false in the way that "Haruki Murakami won the Nobel Prize" is false. It is the wrong kind of response — and there is no verification infrastructure that catches wrong-kind-of-response failures. Gemini 2.5 Pro's refusal is the target behavior, and nothing in current mitigation tooling encourages it.


Source: Flaws

Related concepts in this collection

Concept map
18 direct connections · 188 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

prompt-induced hallucination is a distinct subtype — models fail to evaluate the semantic legitimacy of blended concepts and produce coherent metaphorical reasoning that lacks domain grounding