How does epiplexity measure extractable value differently from compression codelength?

This explores the difference between two ways of measuring 'how much is in the data': the classic compression view (codelength — how few bits can losslessly represent it) versus epiplexity, which asks how much value a learner with limited compute can actually extract and use.

This explores the gap between counting bits and counting usable knowledge. Classic compression says the value of data is its codelength — the shortest description that reproduces it losslessly. This view runs deep: language modeling turns out to be *equivalent* to lossless compression, and a model that compresses well generalizes well, even compressing images and audio better than specialized tools just by conditioning on context Can text-trained models compress images better than specialized tools?. Under this lens, learning and compressing are the same act, and the best measure of information is how short you can make the file.

Epiplexity breaks from that by asking a different question: not 'how few bits?' but 'how much can a *bounded* learner actually pull out and put to work?' The corpus's sharpest statement of why this matters is that Shannon and Kolmogorov measures fail to value data because they assume an observer with unlimited compute Why do Shannon and Kolmogorov measures fail to value data?. To an omniscient compressor, a worked example and a raw data dump carry the same information once you account for the underlying process. But a real learner has a finite budget — and that's exactly why curriculum order matters, why feature engineering helps, and why trained models can exceed the process that generated them. Codelength is blind to all of that; epiplexity is built to see it.

The practical wedge between the two shows up when compression and usefulness pull apart. LLMs compress concepts far more aggressively than humans do, nailing broad category structure while discarding the fine-grained distinctions humans keep Do LLMs compress concepts more aggressively than humans do?. By pure codelength that's a win — fewer bits, cleaner categories. By an extractable-value measure it can be a loss, because the nuance humans preserve is what lets them act in a specific situation. Maximum compression and maximum usable value are simply not the same target.

You can also watch value *increase* under transformation, which a codelength account struggles with. Compressing Big Five personality scores into natural-language summaries surfaces second-order trait patterns that predict nine other psychological scales, and the summary-plus-score combination beats either alone — the rewrite adds extractable signal without adding raw information Can language summaries unlock hidden psychological patterns?. The bits didn't grow; what a bounded learner could *do* with them did. That synergy is precisely the quantity codelength can't price and epiplexity is meant to.

The takeaway worth carrying away: compression codelength asks how cheaply data can be stored by an ideal observer, while epiplexity asks how much a constrained learner can mine from it — and those answers diverge whenever compute is finite, ordering matters, or a clever reframing makes latent structure suddenly learnable. The whole reason data engineering, curricula, and prompt context 'work' is that they raise extractable value without changing the underlying information content at all.

Sources 4 notes

Why do Shannon and Kolmogorov measures fail to value data?

Both measures assume observers with unlimited compute and miss learnable, useful information. The gap explains why feature engineering helps, curriculum order matters, and trained models exceed their generating process—empirical facts classical theory cannot account for.

Can text-trained models compress images better than specialized tools?

Chinchilla models trained exclusively on text achieve better compression rates on images and audio than FLAC and PNG by using their context window to adapt as task-specific compressors. This demonstrates that generalization operates through compression, not specialization.

Do LLMs compress concepts more aggressively than humans do?

Using Rate-Distortion Theory on cognitive datasets, LLMs capture broad category structure but lose fine-grained distinctions humans preserve. LLMs maximize compression efficiency; humans trade compression for contextual meaning that enables situated action.

Can language summaries unlock hidden psychological patterns?

LLMs generate natural language personality summaries from Big Five scores that encode second-order trait patterns, enabling zero-shot prediction of nine other psychological scales with R² > 0.89 structural alignment. Combined summary-and-score predictions outperform either alone, showing synergistic information.

How does epiplexity measure extractable value differently from compression codelength?

Sources 4 notes

Next inquiring lines