What can a bounded observer actually learn from data?
Classical information measures treat all high-entropy content equally, but computationally bounded learners can only extract certain types of structure. What distinguishes learnable regularity from random noise that bounded agents face?
Having diagnosed why classical measures fail, the paper introduces epiplexity: a formalization of what a computationally bounded observer can actually learn from data. The key separation is between structural content — the learnable, reusable regularity a bounded learner can extract — and time-bounded entropy, the random unpredictable content that looks like information to an unbounded observer but is useless to a bounded one. Pseudorandom number generators and chaotic dynamical systems are the canonical examples: high apparent entropy, near-zero epiplexity, because no efficient learner can exploit them.
This single distinction resolves the three paradoxes at once. Information can be created by deterministic computation (the transform makes structure efficiently accessible that was latent before); it does depend on data order (ordering changes what a bounded learner can extract along the way); and likelihood modeling can produce programs more complex than the generating process (because the model encodes extractable structure, not just the source's codelength). Crucially, epiplexity is task-free — it measures learnable structure without reference to a downstream objective, which is what makes it a candidate foundation for data selection as opposed to model selection.
The practical payoff is empirical, not just conceptual. The paper gives procedures to estimate epiplexity that capture differences across data sources, track with downstream performance, and flag dataset interventions that improve out-of-distribution generalization. That last result is the strongest: a task-free structural measure that nonetheless correlates with OOD generalization would explain why some data enables broader transfer than others. This fits the vault's data-curation thread — since Can we prune training data without hurting model performance?, examples differ in learnable value, and epiplexity proposes the underlying quantity that difficulty metrics approximate. Counterpoint and caution: epiplexity is observer-relative (bounded to what compute class?) and estimated, not computed exactly, so its claims inherit the slack of the estimator and the choice of observer. Why it matters: it offers the first principled, task-free quantity for deciding which data to select, generate, or transform for learning.
— "From Entropy to Epiplexity: Rethinking Information for Computationally Bounded Intelligence", https://arxiv.org/abs/2601.03220
Related concepts in this collection
-
Can we prune training data without hurting model performance?
This explores whether difficulty metrics can identify redundant training examples that can be safely removed. It matters because most datasets contain massive waste — if we can find which examples are truly necessary, we could train better models on far less data.
difficulty metrics approximate the learnable value epiplexity aims to measure directly
-
Does procedural knowledge drive reasoning more than factual retrieval?
Explores whether models learn reasoning through general procedures across diverse documents rather than memorizing specific facts. This matters for understanding what pretraining data actually teaches models to reason.
a content-type account of which data generalizes; epiplexity offers a measure-theoretic account of the same phenomenon
-
Can deep learning theory unify around training dynamics?
Is learning mechanics—focused on average-case predictions and training dynamics rather than worst-case bounds—the emerging framework that finally unifies fragmented deep learning theory?
situates epiplexity within the compute-aware theory-of-learning program
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
epiplexity measures the structural information a computationally bounded observer can extract for data selection