How do semantic failure modes map to attentional and intentional layers?
This explores whether failures of *meaning* (semantic errors) actually live where they appear to — or whether they're better located in two deeper layers: what the model attends to, and what it (or the user) intends.
This explores whether failures of meaning actually live at the meaning layer at all — or whether they trace back to two different places: what the model is paying attention to, and what intent it's trying to track. The corpus suggests a recurring move: a failure that *looks* semantic is usually misdiagnosed, and the fix only works once you relocate it to the right layer.
Start with the semantic layer itself, where it's most legible. Work using Abstract Meaning Representation breaks dialogue incoherence into four concrete types — contradiction, coreference slippage, irrelevancy, and fading engagement — and shows these are detectable at the meaning level even when surface text manipulations miss them entirely What semantic failures break dialogue coherence most realistically?. So semantic failures are real and nameable. But naming them isn't the same as locating their cause. The argument that LLM errors are *fabrication*, not hallucination or confabulation, makes exactly this point: accurate and inaccurate outputs use identical machinery, so words like 'hallucination' misdirect the fix toward perception or memory — the wrong layers Should we call LLM errors hallucinations or fabrications?.
The attentional layer is where several apparent meaning failures actually resolve. The sharpest case: verbose chain-of-thought *degrades* multimodal perception, because the real bottleneck is visual attention allocation, not verbalization — optimizing the text policy trains the wrong target entirely Does verbose chain-of-thought actually help multimodal perception tasks?. Token-level memorization tells a parallel story from inside the reasoning chain: local memorization based on immediately preceding tokens drives up to 67% of reasoning errors, meaning the model's attention is captured by what's nearby rather than what's relevant Where do memorization errors arise in chain-of-thought reasoning?. And the finding that corrupted reasoning traces teach as well as correct ones suggests the chain functions as attentional scaffolding — structure that holds computation in place — rather than as carrier of meaning Do reasoning traces need to be semantically correct?. CoT as 'constrained imitation' rather than inference is the same insight from another angle: structural coherence matters more than content correctness Why does chain-of-thought reasoning fail in predictable ways?.
The intentional layer is the hardest to see because it spans both the model and the human. On the model side, reasoning systems show surprising deficits in social cognition — tracking goals and intentions — even while excelling at formal tasks Where exactly do reasoning models fail and break?. On the human side, the Rose-Frame work shows three cognitive traps (confusing the map for the territory, conflating intuition with reasoning, confirmation bias) compounding into epistemic drift — a failure of *the reader's* intent-tracking, not the model's output Why do people trust AI outputs they shouldn't?. Here the semantic surface can be flawless while the intentional layer quietly fails.
The thing worth carrying away: these layers predict which interventions work. If the failure is attentional, grounding fixes it — interleaving reasoning with real-world tool queries injects feedback at each step and outperforms pure CoT by double digits Can interleaving reasoning with real-world feedback prevent hallucination?. If it's executional rather than semantic at all, tools dissolve the supposed 'reasoning cliff' Are reasoning model collapses really failures of reasoning?. A semantic patch on an attentional or intentional failure is wasted effort — which is exactly why the layer you assign a failure to is the most consequential decision you make about fixing it.
Sources 10 notes
Research using Abstract Meaning Representation identified four distinct incoherence types: contradiction, coreference inconsistency, irrelevancy, and decreased engagement. AMR-trained classifiers detect these semantic failures while text-level manipulations alone cannot.
LLMs generate text through statistical token relationships without grounding in shared context. Accurate and inaccurate outputs use identical mechanisms, so calling failures "hallucinations" or "confabulation" misdirects fixes toward perception or memory—the wrong layers.
Long rationales and text-token RL help reasoning but hurt fine-grained perception tasks because the actual bottleneck is visual attention allocation, not verbalization. Standard CoT optimization trains the wrong policy target.
STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.
Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.
CoT guides models to pattern-match reasoning structure rather than perform genuine inference. This explains distribution-bounded failures, why structural coherence matters more than content correctness, and why performance optimizes against interpretability.
Research reveals four core failure modes: exploration wandering rather than systematic search, premature thought switching, poor hybrid reasoning mode selection, and surprising deficits in social cognition despite excelling at formal tasks. Longer reasoning chains create more corruption surfaces.
Rose-Frame identifies map-territory confusion, intuition-reason conflation, and confirmation-bias reinforcement as traps that multiply their distorting effects when they co-occur. Evidence from cross-linguistic overreliance and architectural transformer biases confirms the compounding mechanism operates universally.
ReAct demonstrates that alternating verbal reasoning with external tool queries (Wikipedia API, environment interaction) prevents error propagation by injecting real-world feedback at each step. On knowledge-intensive and interactive tasks, this approach outperforms pure chain-of-thought and reinforcement learning by 10-34% absolute accuracy.
Models confined to text-only generation cannot execute multi-step procedures at scale, even when they know the underlying algorithm. Tool-enabled models solve problems beyond the supposed reasoning cliff, suggesting the bottleneck is procedural execution bandwidth.