Why do Shannon and Kolmogorov measures fail to value data?

Shannon information and Kolmogorov complexity assume unlimited computational capacity. But do these classical measures actually capture what bounded learners can extract from real data?

Note · 2026-05-28 · sourced from Data

When asked the practical question "how much can be learned from this data?", the two canonical information measures come up nearly empty-handed. Shannon information and Kolmogorov complexity both assume an observer with unlimited computational capacity, and neither targets the useful information content — the structure a learner could actually extract and exploit. "From Entropy to Epiplexity" makes this concrete through three apparent paradoxes that are mathematically justified under classical theory yet clash with empirical practice: (1) information cannot be increased by deterministic transformations; (2) information is independent of the order of data; (3) likelihood modeling is merely distribution matching.

Each paradox is a place where the unbounded-compute assumption bites. To an observer with infinite compute, a deterministic transform adds nothing (it could always invert or recompute it), data order is irrelevant (it can reorder freely), and likelihood modeling cannot exceed the generating process. But computationally bounded learners — the only kind that exist — experience exactly the opposite: feature engineering helps, curriculum order matters, and trained models can encode programs more complex than their data-generating process. The classical measures are not wrong; they answer a question (compressibility for an omniscient agent) that is not the question machine learning poses.

The significance is foundational rather than incremental: it explains why a theory of data value has been elusive. The field has been trying to value data with tools built for a different problem. This belongs with the vault's learning-mechanics thread — since Can deep learning theory unify around training dynamics?, the move from worst-case to average-case, compute-aware reasoning is exactly the shift epiplexity makes for information theory. It also reframes the modeling-is-compression identity: since Can text-trained models compress images better than specialized tools?, compression-as-modeling holds for an idealized coder, but the value a bounded model extracts is not captured by codelength alone. Counterpoint: the classical measures remain correct and useful for their intended questions (channel capacity, incompressibility); the claim is about fit to the data-valuation problem, not about their validity. Why it matters: it diagnoses the conceptual gap that any practical theory of data selection must fill.

— "From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence", https://arxiv.org/abs/2601.03220

Related concepts in this collection

Can deep learning theory unify around training dynamics? Is learning mechanics—focused on average-case predictions and training dynamics rather than worst-case bounds—the emerging framework that finally unifies fragmented deep learning theory?
the broader compute-aware, average-case turn epiplexity instantiates for information theory
Can text-trained models compress images better than specialized tools? Do general-purpose language models trained only on text outperform domain-specific compressors like PNG and FLAC on their native data? This tests whether compression ability is universal or requires domain specialization.
the modeling-is-compression identity epiplexity refines by distinguishing codelength from extractable value
What can a bounded observer actually learn from data? Classical information measures treat all high-entropy content equally, but computationally bounded learners can only extract certain types of structure. What distinguishes learnable regularity from random noise that bounded agents face?
enables: the positive proposal this note motivates — the same paper's bounded-observer measure that fills the data-valuation gap the classical measures leave open

Concept map

12 direct connections · 132 in 2-hop network ·dense cluster Open in graph ↗

Why do Shannon and Kolmogorov measures fail to v… Can deep learning theory unify around training dyn… Can text-trained models compress images better tha… What can a bounded observer actually learn from da…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Original note title

shannon and kolmogorov measures fail to value data because they assume unbounded-compute observers

Why do Shannon and Kolmogorov measures fail to value data?

Related concepts in this collection

Related papers in this collection