Why does LLM compression eliminate causal grounding in conceptual representations?
This explores a claim baked into the question — that the way LLMs compress concepts strips out causal grounding — and tests how strongly the corpus actually supports it, versus complicating it.
This explores whether LLM compression really *eliminates* causal grounding in how models represent concepts — and the corpus both supports the premise and pushes back on it. The starting point is that LLMs and humans compress differently: measured through Rate-Distortion Theory, models maximize compression efficiency and capture broad category structure, while humans deliberately trade compression away to preserve the fine-grained, contextual distinctions that let them act in specific situations Do LLMs compress concepts more aggressively than humans do?. That trade is the heart of the question: what gets thrown away in aggressive compression is exactly the situated detail that causal grounding lives in.
But compression is only half the story — the other half is *what* is being compressed. Several notes argue the limitation is inherited from the medium, not invented by the model. Text-only LLMs are 'Plato's cave' learners: text strips out the physics, geometry, and causality present in the world before the model ever sees it, so the model manipulates symbols whose source dynamics were already removed Are text-only language models fundamentally limited by abstraction?. In that framing, compression doesn't erase causal grounding so much as operate on input that never carried it. A complementary note pushes further: LLMs realize Saussure's *langue* — meaning built purely from relational structure compressed out of text, with no external referent or embodied grounding required for fluent output Can language models learn meaning without engaging the world?. Fluency without grounding is the design, not a bug.
What this costs shows up most sharply when reasoning gets decoupled from familiar meaning. When semantic content is stripped from a task, LLM performance collapses even when the correct rules are handed to the model in context — evidence that models lean on token associations and parametric commonsense rather than manipulating concepts as portable, causal objects Do large language models reason symbolically or semantically?. The 'Potemkin understanding' failure is the same wound from another angle: a model can correctly explain a concept, fail to apply it, and then recognize its own failure — a split between explanation and execution pathways that genuinely causal representations wouldn't show Can LLMs understand concepts they cannot apply?. And entailment work reveals models predicting 'A entails B' based on whether B simply appears attested in training, not whether A supports it — relational, causal grounding replaced by memorized co-occurrence Do LLMs predict entailment based on what they memorized?.
Here's the twist the corpus adds, and the thing you might not have expected to want: causal grounding isn't uniformly absent — it's unevenly present. LLMs actually handle *causal* reasoning better than *temporal* reasoning, precisely because causal connectives ('because', 'therefore') are explicit and frequent in text while temporal order is usually left implicit Why do LLMs handle causal reasoning better than temporal reasoning?. So what survives compression is whatever the training text made statistically loud. Models even reproduce *human* causal biases — weak explaining-away, Markov violations — suggesting they've absorbed the surface statistics of human causal talk rather than a grounded causal model underneath Do large language models make the same causal reasoning mistakes as humans?. The picture from mechanistic interpretability fits: understanding is a patchwork of tiers, where compact principled circuits coexist with shallower heuristics rather than replacing them Do language models understand in fundamentally different ways?.
If there's a unifying answer, it's that 'compression eliminates causal grounding' is too clean. Compression preserves the *linguistic traces* of causality while dropping the *world-dynamics* those traces point to — which is why grounding can be partly restored from outside the compressed representation. Interleaving reasoning with real tool queries and environment feedback (ReAct) cuts hallucination by injecting world-contact at each step Can interleaving reasoning with real-world feedback prevent hallucination?, and reasoning at the sentence-embedding level rather than token-by-token produces more coherent, language-agnostic abstraction Can reasoning happen at the sentence level instead of tokens?. The grounding wasn't destroyed by compression so much as left behind by text — and it can be piped back in through action and architecture.
Sources 11 notes
Using Rate-Distortion Theory on cognitive datasets, LLMs capture broad category structure but lose fine-grained distinctions humans preserve. LLMs maximize compression efficiency; humans trade compression for contextual meaning that enables situated action.
Text strips the physics, geometry, and causality present in reality, forcing language models to manipulate symbols without grounding in their source dynamics. This creates predictable failure modes in physical, geometric, and causal reasoning that multimodal training could address.
Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.
McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.
ChatGPT excels at causal relations but struggles with temporal ordering because causal connectives are explicit and frequent in training data, while temporal order is often implicit and must be inferred contextually.
LLMs show weak explaining away and Markov violations in collider networks, matching human error patterns exactly. This suggests shared mechanisms rooted in training data statistics rather than categorical reasoning inferiority.
Mechanistic interpretability reveals conceptual understanding (features as directions), state-of-world understanding (factual connections), and principled understanding (compact circuits). Crucially, higher tiers coexist with lower-tier heuristics rather than replacing them, creating a patchwork of capabilities.
ReAct demonstrates that alternating verbal reasoning with external tool queries (Wikipedia API, environment interaction) prevents error propagation by injecting real-world feedback at each step. On knowledge-intensive and interactive tasks, this approach outperforms pure chain-of-thought and reinforcement learning by 10-34% absolute accuracy.
Meta's Large Concept Model operates on sentence embeddings rather than tokens, reasoning in a language-agnostic space before decoding to any target language. This hierarchical approach with paragraph-level planning produces more coherent output than flat token generation.