Reasoning and Learning Architectures

Why do Shannon and Kolmogorov measures fail to value data?

Shannon information and Kolmogorov complexity assume unlimited computational capacity. But do these classical measures actually capture what bounded learners can extract from real data?

Note · 2026-05-28 · sourced from Data

When asked the practical question "how much can be learned from this data?", the two canonical information measures come up nearly empty-handed. Shannon information and Kolmogorov complexity both assume an observer with unlimited computational capacity, and neither targets the useful information content — the structure a learner could actually extract and exploit. "From Entropy to Epiplexity" makes this concrete through three apparent paradoxes that are mathematically justified under classical theory yet clash with empirical practice: (1) information cannot be increased by deterministic transformations; (2) information is independent of the order of data; (3) likelihood modeling is merely distribution matching.

Each paradox is a place where the unbounded-compute assumption bites. To an observer with infinite compute, a deterministic transform adds nothing (it could always invert or recompute it), data order is irrelevant (it can reorder freely), and likelihood modeling cannot exceed the generating process. But computationally bounded learners — the only kind that exist — experience exactly the opposite: feature engineering helps, curriculum order matters, and trained models can encode programs more complex than their data-generating process. The classical measures are not wrong; they answer a question (compressibility for an omniscient agent) that is not the question machine learning poses.

The significance is foundational rather than incremental: it explains why a theory of data value has been elusive. The field has been trying to value data with tools built for a different problem. This belongs with the vault's learning-mechanics thread — since Can deep learning theory unify around training dynamics?, the move from worst-case to average-case, compute-aware reasoning is exactly the shift epiplexity makes for information theory. It also reframes the modeling-is-compression identity: since Can text-trained models compress images better than specialized tools?, compression-as-modeling holds for an idealized coder, but the value a bounded model extracts is not captured by codelength alone. Counterpoint: the classical measures remain correct and useful for their intended questions (channel capacity, incompressibility); the claim is about fit to the data-valuation problem, not about their validity. Why it matters: it diagnoses the conceptual gap that any practical theory of data selection must fill.


— "From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence", https://arxiv.org/abs/2601.03220

Related concepts in this collection

Concept map
12 direct connections · 132 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

shannon and kolmogorov measures fail to value data because they assume unbounded-compute observers