INQUIRING LINE

Can task-agnostic compression of documents remain broadly useful for later queries?

This explores whether you can boil documents down *before* you know what you'll later ask of them — and whether that early, query-blind compression stays useful once the real questions arrive.


This explores whether you can boil documents down *before* you know what you'll later ask of them. The corpus has a clean demonstration that the answer is yes — with a crucial design caveat. ReadAgent compresses long documents into 'gist memories' *before knowing the task*, then reaches back for the original details only when a question actually needs them, extending effective context 3–20× and beating retrieval baselines on long-document QA Can LLMs read long documents like humans do?. The trick isn't that the gist is enough — it's that the gist is a pointer back to the full text. Compression is task-agnostic; *reconstruction* is task-specific.

That distinction is where things break when it's ignored. When models compress by discarding rather than by gisting-with-recall, the losses compound silently: testing 19 frontier models across 52 domains found ~25% of document content corrupted over long delegated relay tasks, with errors stacking up and never plateauing Do frontier LLMs silently corrupt documents in long workflows?. So 'task-agnostic compression' is safe to the degree it stays recoverable, and dangerous to the degree it's final. The same act — summarize now, use later — is a feature in one architecture and rot in the other.

There's a deeper reason to expect query-blind compression to generalize at all: compression *is* generalization. Text-trained models compress images and audio better than PNG and FLAC, purely by adapting in-context, which suggests a good compressor is broadly useful by its nature rather than by being tuned to a task Can text-trained models compress images better than specialized tools?. The same equivalence shows up in training theory: optimal language-model learning can be derived straight from a lossless-compression objective Does optimal language model learning maximize data compression?. If learning and compression are the same thing, then a compact representation built without a target query should still carry the structure later queries need.

But 'broadly useful' turns out to depend on *who* is doing the later querying. An external, RL-trained context manager compresses optimally only by matching the consumer's reliability — strong agents benefit from high-fidelity preservation, weaker agents need aggressive pruning to stay coherent Can external managers compress context better than frozen agents?. There's no single ideal compression; the right amount of discarding is a property of the reader, not the document. And retrieval-style knowledge can itself be compressed into a small parametric decoder that plugs into any model and still preserves long-tail facts Can retrieval knowledge compress into a tiny parametric model? — evidence that even very lossy-looking compression can stay query-general if it's the right kind of lossy.

The quiet reframing worth taking away: research on long context argues the real bottleneck was never storage capacity but the *compute* to transform raw context into a usable internal state — and that consolidation improves with more passes, like test-time scaling Is long-context bottleneck really about memory or compute?. Read that way, task-agnostic compression isn't a lossy shortcut you tolerate; it's the up-front work of turning documents into something queryable at all. The question stops being 'will compression hurt later queries' and becomes 'how much consolidation do you pay for now versus at query time.'


Sources 7 notes

Can LLMs read long documents like humans do?

ReadAgent compresses documents into gist memories before knowing the task, then retrieves details only when needed, extending effective context 3–20× and outperforming retrieval baselines on long-document QA.

Do frontier LLMs silently corrupt documents in long workflows?

Testing 19 models across 52 domains shows even advanced systems degrade documents by ~25% over extended relay tasks, with errors compounding silently without plateauing through 50 round-trips.

Can text-trained models compress images better than specialized tools?

Chinchilla models trained exclusively on text achieve better compression rates on images and audio than FLAC and PNG by using their context window to adapt as task-specific compressors. This demonstrates that generalization operates through compression, not specialization.

Does optimal language model learning maximize data compression?

Research shows that optimal LM training can be derived from a lossless compression objective, yielding a Learning Law where all examples contribute equally in the optimal process. This approach improves scaling law coefficients, not just constants.

Can external managers compress context better than frozen agents?

An external RL-trained manager can adaptively prune context for frozen agents, with the key insight that stronger agents benefit from high-fidelity preservation while weaker agents need aggressive compression to stay reliable.

Can retrieval knowledge compress into a tiny parametric model?

Memory Decoder successfully compresses kNN-LM retrieval distributions into a small transformer that plugs into any LLM via output interpolation. It preserves long-tail factual knowledge while maintaining semantic coherence, reducing perplexity by 6.17 points across domains.

Is long-context bottleneck really about memory or compute?

Research shows the bottleneck is not memory capacity but the compute required to consolidate evicted context into fast weights during offline sleep phases. Performance improves with more consolidation passes, following a test-time scaling pattern on harder reasoning tasks.

Next inquiring lines