LLM Reasoning and Architecture Language Understanding and Pragmatics

Do language models understand in fundamentally different ways?

Does mechanistic evidence reveal distinct tiers of understanding in LLMs—from concept recognition to factual knowledge to principled reasoning? And do these tiers coexist rather than replace each other?

Note · 2026-04-18 · sourced from MechInterp
What actually happens inside the minds of language models? What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

This paper synthesizes mechanistic interpretability findings into a philosophical framework that moves beyond the binary "does AI understand?" debate. The framework proposes three hierarchical tiers:

Tier 1: Conceptual understanding — arises when a model forms "features" as directions in latent space that unify diverse manifestations of a single entity or property. This is the representational foundation: the model has learned that different surface forms connect to the same underlying concept. MI evidence: SAE features, linear probing, representation geometry studies all demonstrate this.

Tier 2: State-of-the-world understanding — arises when the model learns contingent factual connections between features and dynamically tracks changes. "Michael Jordan is a basketball player" is not just a high-probability string but a reflection of an internal model linking the Michael Jordan concept to the basketball player concept. This goes beyond association to structured knowledge representation.

Tier 3: Principled understanding — arises when the model discovers compact "circuits" that connect facts via general rules rather than memorizing each fact individually. This is the shift from knowing that to knowing why. The grokking literature provides the clearest evidence: models that transition from memorization to generalization develop circuits implementing actual algorithmic rules (e.g., modular addition via Fourier transforms).

The critical insight is that higher-tier mechanisms coexist with lower-tier heuristics rather than replacing them. A model can have principled understanding of arithmetic in one circuit while relying on pattern-matching heuristics in another. This heterogeneity means understanding is not a single binary property but a patchwork: principled in some domains, merely conceptual in others, and purely heuristic in yet others.

This has direct implications for trust and deployment. The fact that a model demonstrates principled understanding in one domain gives no guarantee that it operates at the same tier in adjacent domains. The coexistence of understanding tiers also explains why models can be simultaneously impressive and brittle: the principled circuits work reliably, but the heuristic patches fail unpredictably.


Source: MechInterp

Related concepts in this collection

Concept map
15 direct connections · 135 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

mechanistic interpretability evidence supports three hierarchical varieties of LLM understanding — conceptual then state-of-world then principled — each tied to a distinct computational organization