Do LLMs compress concepts more aggressively than humans do?
Do language models prioritize statistical compression over semantic nuance when forming conceptual representations, and how does this differ from human category formation? This matters because it may explain why LLMs fail at tasks requiring fine-grained distinctions.
An information-theoretic framework drawing from Rate-Distortion Theory and the Information Bottleneck principle quantitatively compares how LLMs and humans balance compression against semantic fidelity. The comparison uses seminal cognitive psychology datasets (Rosch typicality ratings, McCloskey & Glucksberg category membership) as human baselines.
Where they converge: LLM-derived clusters significantly align with human-defined conceptual categories. Broad category structure — that robins and sparrows are birds, that chairs and tables are furniture — is captured reliably. Some encoder models achieve surprisingly strong alignment with human categorical structure, sometimes outperforming much larger models, suggesting factors beyond scale matter for human-like abstraction.
Where they diverge: LLMs fail to capture fine-grained semantic distinctions crucial for human understanding. Correlations between LLM item-to-category-label similarities and human typicality judgments are generally modest. Items humans perceive as highly typical (robin as prototypical bird) are not consistently represented as substantially more similar to the category label embedding than atypical items (penguin as bird).
The fundamental divergence in strategy: LLMs exhibit a strong bias toward aggressive statistical compression — maximally reducing representational complexity. Human conceptual systems prioritize adaptive nuance and contextual richness, even at the cost of lower compressional efficiency. Humans preserve distinctions that matter for situated action (the difference between a robin and a penguin matters for different reasons in different contexts), while LLMs collapse these distinctions in favor of statistical regularity.
This finding refines the debate around Can text-trained models compress images better than specialized tools?. LLMs are excellent compressors — but compression is not comprehension. The compression strategy differs fundamentally from how humans organize concepts. Human categorization isn't optimized for compression; it's optimized for adaptive action in context. The "cost" of preserving nuance (lower compressional efficiency) is paid because nuance has survival value.
This connects to Does semantic grounding in language models come in degrees? by providing the information-theoretic mechanism behind weak causal grounding: if LLMs compress away the fine-grained distinctions that ground causal reasoning (the specific weight, texture, and behavior of a robin versus a penguin), causal grounding requires exactly the nuance that compression eliminates.
The literary language dimension: Literary language is where the compression-nuance divergence becomes most consequential. Literary prose and poetry are maximally nuanced — every word choice is deliberate, ambiguity is preserved intentionally, connotation carries as much weight as denotation. LLM compression preserves denotation (what a text literally says) but destroys connotation (what a text means through association, implication, and resonance). This is testable: having LLMs paraphrase poetry and measuring which dimensions of meaning survive versus collapse would quantify the gap between understanding what a text says and understanding what a text means. Since Can LLMs truly understand literary meaning or just mechanics?, the compression-nuance trade-off is one of four converging mechanisms that explain the mechanics-meaning gap.
Source: Cognitive Models Latent; enriched from inbox/research-brief-llm-literary-analysis-2026-03-02.md
Related concepts in this collection
-
Can text-trained models compress images better than specialized tools?
Do general-purpose language models trained only on text outperform domain-specific compressors like PNG and FLAC on their native data? This tests whether compression ability is universal or requires domain specialization.
qualified: LLMs compress excellently but with a strategy fundamentally different from human conceptual compression
-
Does semantic grounding in language models come in degrees?
Rather than asking whether LLMs truly understand meaning, this explores whether grounding is actually a multi-dimensional spectrum. The question matters because it reframes the sterile understand/don't-understand debate into measurable, distinct capacities.
mechanism: aggressive compression eliminates fine-grained distinctions needed for causal grounding
-
Why do language models fail at communicative optimization?
LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?
convergent: statistical regularity capture without communicative optimization parallels compression without nuance
-
Are language models developing real functional competence or just formal competence?
Neuroscience suggests formal linguistic competence (rules and patterns) and functional competence (real-world understanding) rely on different brain mechanisms. Can next-token prediction alone produce both, or does it leave functional competence behind?
the compression-nuance split may correspond to the formal-functional split
-
Do standard analysis methods hide nonlinear features in neural networks?
Current representation analysis tools like PCA and linear probing may systematically miss complex nonlinear computations while over-reporting simple linear features. This raises questions about whether our interpretability methods are actually capturing what networks compute.
analysis bias compounds the compression problem: LLMs aggressively compress representations, and our analysis tools are biased toward detecting the simple features that survive compression while missing the complex features that nuance requires — the measured gap between LLM and human conceptual representations may be partly an analysis artifact
-
Can we measure reading efficiency as a quality metric?
How can we quantify whether generated text delivers novel information efficiently or wastes reader attention through redundancy? This matters because standard coherence and fluency scores miss texts that are well-written but informationally dense.
KD provides a measurable consequence of aggressive compression: LLMs that compress conceptual representations into statistical patterns produce text with lower knowledge density, as compression eliminates the nuanced distinctions that create unique atomic knowledge units
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
llms prioritize aggressive statistical compression while humans preserve adaptive nuance and contextual richness in conceptual representations