Language Understanding and Pragmatics LLM Reasoning and Architecture

Can LLMs understand concepts they cannot apply?

Explores whether large language models can correctly explain ideas while simultaneously failing to use them—and whether that combination reveals something fundamentally different from ordinary mistakes.

Note · 2026-02-21 · sourced from Philosophy Subjectivity
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

The Potemkin understanding paper identifies a failure pattern that is categorically different from ordinary LLM error. When a model correctly explains an ABAB rhyme scheme, then fails to generate one, then recognizes that its generation doesn't rhyme — that triple combination is not just wrong, it is incoherent. No human with that explanation would behave that way. The combination is irreconcilable with any human cognitive pattern.

This is worth separating from other LLM failure types because the mechanism matters for diagnosis and repair:

The "Potemkin" framing (after Potemkin villages — facades with nothing behind) is precise: the model passes benchmark tests designed to detect understanding because those benchmarks test the same cognitive operations as humans. The tests only work as diagnostics if LLMs misunderstand concepts the same way humans do. But Potemkin understanding means the model can perform at the surface without the underlying integration that tests were designed to probe.

Benchmarks used to evaluate LLMs are also used to evaluate people. They are valid tests only if LLMs fail in human-compatible ways. Potemkin understanding shows that this assumption fails — LLMs can fail in ways that no human cognitive model predicts.

The three-domain evidence (literary techniques, game theory, psychological biases) shows this is not domain-specific. Across domains: near-perfect explanation accuracy, significant application failure, model recognition of failure. The incoherence is stable.

The "computational split-brain syndrome" diagnosis. "Comprehension Without Competence" provides the architectural analysis underlying Potemkin understanding. Through controlled experiments, the authors demonstrate that instruction and action pathways are geometrically and functionally dissociated — a phenomenon they term computational split-brain syndrome. The failure is not in knowledge access but in computational execution. LLMs function as powerful pattern completion engines but lack the architectural scaffolding for principled, compositional reasoning. This diagnosis also clarifies why mechanistic interpretability findings may reflect training-specific pattern coordination rather than universal computational principles. The geometric separation between instruction and execution pathways represents a structural limitation, not a knowledge limitation.

The Explain-Query-Test (EQT) framework provides direct empirical measurement of the explanation-comprehension gap. In EQT, a model (1) generates an explanation of a topic, (2) generates question-answer pairs from that explanation, and (3) answers those same questions without access to its own explanation. The finding: models consistently fail questions derived from their own explanations. The EQT gap correlates strongly with MMLU-PRO benchmark performance — making EQT a benchmark-free evaluation method that uses only the model's own outputs as ground truth. Critically, the gap is domain-specific: biology and psychology (domains where models initially perform well) show the largest EQT drops, while law and engineering (lower baseline) show smaller drops. This suggests Potemkin understanding is worst precisely where surface performance is highest — a counterintuitive result that demands explanation. High benchmark performance may mask explanation-comprehension disconnection rather than reveal genuine understanding.


Source: Philosophy Subjectivity; enriched from Reasoning Methods CoT ToT

Related concepts in this collection

Concept map
18 direct connections · 178 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

potemkin understanding is a distinct failure mode where correct explanation combined with failed application is incoherent not merely wrong