From Tokens to Thoughts: How LLMs and Humans Trade Compression for Meaning
Humans organize knowledge into compact categories through semantic compression by mapping diverse instances to abstract representations while preserving meaning (e.g., robin and blue jay are both birds; most birds can fly). These concepts reflect a t-off between expressive fidelity and representational simplicity. Large Language Models (LLMs) demonstrate remarkable linguistic abilities, yet whether their internal representations strike a human-like trade-off between compression and semantic fidelity is unclear. We introduce a novel information-theoretic framework, drawing from Rate-Distortion Theory and the Information Bottleneck principle, to quantitatively compare these strategies. Analyzing token embeddings from a diverse suite of LLMs against seminal human categorization benchmarks, we uncover key divergences. While LLMs form broad conceptual categories that align with human judgment, they struggle to capture the fine-grained semantic distinctions crucial for human understanding. More fundamentally, LLMs demonstrate a strong bias towards aggressive statistical compression, whereas human conceptual systems appear to prioritize adaptive nuance and contextual richness, even if this results in lower compressional efficiency by our measures.
Despite this, a fundamental enigma persists: Do LLMs truly grasp concepts and meaning analogously to humans, or is their success primarily rooted in sophisticated statistical pattern matching over vast datasets? This question is particularly salient given the human ability to effortlessly distill extensive input into compact, meaningful concepts, a process governed by the inherent trade-off between informational compression and semantic fidelity [Tversky, 1977, Rosch, 1973b].
As the mental scaffolding of human cognition, concepts enable efficient interpretation, generalization from sparse data, and rich communication. For LLMs to transcend surface-level mimicry and achieve more human-like understanding, it is critical to investigate how their internal representations navigate the crucial trade-off between information compression and the preservation of semantic meaning. Do LLMs develop conceptual structures mirroring the efficiency and richness of human thought, or do they employ fundamentally different representational strategies?
To address this, we introduce a novel quantitative methodology rooted in information theory. We develop and apply a framework drawing from Rate-Distortion Theory [Shannon, 1948] and the Information Bottleneck principle [Tishby et al., 2000] to systematically compare how LLMs and human conceptual structures balance representational complexity (compression) with semantic fidelity. As a crucial human baseline, we leverage seminal datasets from cognitive psychology detailing human categorization [Rosch, 1973a, 1975, McCloskey and Glucksberg, 1978]. A contribution of this work is the digitization and public release of these classic datasets, which offer benchmarks of high empirical rigor often exceeding modern crowdsourced alternatives. Our framework is tailored to dissect how these different systems navigate the compression-meaning trade-off.
Our comparative analysis across a diverse suite of LLMs reveals divergent representational strategies. While LLMs generally form broad conceptual categories aligned with human judgment, they often fail to capture the fine-grained semantic distinctions pivotal to human understanding. More critically, we uncover a stark contrast in priorities: LLMs exhibit a strong drive towards aggressive statistical compression, whereas human conceptual systems appear to favor adaptive nuance and contextual richness, even at a potential cost to sheer compressional efficiency by our measures. This divergence underscores fundamental differences and informs pathways for developing AI with more humanaligned conceptual understanding.
LLM-derived clusters significantly align with human-defined conceptual categories, suggesting they capture key aspects of human conceptual organization. Notably, certain encoder models exhibit surprisingly strong alignment, sometimes outperforming much larger models, highlighting that factors beyond sheer scale influence human-like categorical abstraction.
Specifically, we calculated the cosine similarity between each item’s token embedding and the token embedding of its human-assigned category name (e.g., ‘robin‘ vs. ‘bird‘). These item-to-category-label similarities were then correlated (Spearman’s ρ [Wissler, 1905]) with human-rated typicality scores.
Results and Observations: Spearman correlations between LLM-derived item-to-category-label similarities and human typicality judgments are generally modest across most models and datasets (Table 2 in Appendix A.5; Figure 6). Although some correlations reach statistical significance (p < 0.05), their magnitudes typically indicate a limited correspondence. This pattern suggests that items humans perceive as highly typical of a category are not consistently represented by LLMs as substantially more similar to that category label’s embedding.