LLM Reasoning and Architecture Language Understanding and Pragmatics

Can language models understand without actually executing correctly?

Do LLMs truly comprehend problem-solving principles if they consistently fail to apply them? This explores whether the gap between articulate explanations and failed actions points to a fundamental architectural limitation.

Note · 2026-02-23 · sourced from Flaws

LLMs display surface fluency yet systematically fail at tasks requiring symbolic reasoning, arithmetic accuracy, and logical consistency. The diagnosis: a persistent gap between comprehension and competence, rooted not in knowledge access but in computational execution.

The paper names this "computational split-brain syndrome" — instruction and action pathways are geometrically and functionally dissociated within the model. The model can articulate the correct principle for how to solve a problem, then fail to apply that principle in the next step. This is not forgetting, not hallucination, not knowledge deficit — it is a structural disconnect between knowing-how-to-describe and knowing-how-to-do.

The failure recurs across domains: mathematical operations, relational inferences, logical deductions. The consistency across domains suggests an architectural rather than domain-specific cause. LLMs function as powerful pattern completion engines but lack the scaffolding for principled, compositional reasoning — structure for executing what they can describe.

This provides a mechanistic name for Can LLMs understand concepts they cannot apply?. Potemkin understanding names the phenomenon; computational split-brain names the mechanism. The geometric separation between instruction representations and execution pathways explains why the model can generate correct explanations and incorrect applications simultaneously without detecting the inconsistency.

It also concretizes Why do language models fail to act on their own reasoning?. The 87% vs 64% gap is the quantitative signature of the split-brain: the instruction pathway (rationale generation) and the execution pathway (action selection) draw on overlapping but dissociated representations.

The paper further argues that mechanistic interpretability findings may reflect training-specific pattern coordination rather than universal computational principles — the internal structures we discover may be execution artifacts, not reasoning architecture.

Planning as the paradigmatic test case. The 8-puzzle study (On the Limits of Innate Planning in Large Language Models) isolates two specific deficits: (1) brittle internal state representations leading to frequent invalid moves, and (2) weak heuristic planning with models entering loops or selecting actions that don't reduce distance to the goal. Even with an external move validator providing only valid moves, none of the models solve any puzzles. The comprehension-competence split is stark: models can articulate puzzle-solving strategies but cannot maintain accurate state representations across sequential moves. Since Can large language models actually create executable plans?, the gap widens with task complexity: 87% correct rationales → 64% correct actions → 12% executable plans → 0% puzzle solutions with validator assistance.

Source: Flaws

Related concepts in this collection

Can LLMs understand concepts they cannot apply? Explores whether large language models can correctly explain ideas while simultaneously failing to use them—and whether that combination reveals something fundamentally different from ordinary mistakes.
Potemkin understanding is the phenomenon; split-brain is the mechanism
Why do language models fail to act on their own reasoning? LLMs generate correct step-by-step reasoning 87% of the time but only follow through with matching actions 64% of the time. What drives this gap between knowing and doing?
the quantitative signature of the comprehension-competence dissociation
Do language models actually use their encoded knowledge? Probes can detect that LMs encode facts internally, but do those encoded facts causally influence what the model generates? This explores the gap between knowing and doing.
the encoding≠generation gap is the representational version of the same split

Concept map

15 direct connections · 137 in 2-hop network ·dense cluster

Can language models understand without actually … Can LLMs understand concepts they cannot apply? Why do language models fail to act on their own re… Do language models actually use their encoded know…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

comprehension without competence is a distinct LLM failure mode — instruction and execution pathways are dissociated