Why does each rewrite cycle degrade domain-specific details differently than compression?
This explores the difference between two ways AI loses information: a single compression step (squeezing content into fewer bits or tokens) versus repeated rewrite cycles (passing a document through many edit/relay passes), and why the second erodes specialized detail in a distinct way.
This explores the difference between two ways AI loses information — a single compression pass versus many iterative rewrites — and why the corpus treats them as fundamentally different failure modes rather than the same loss seen twice. The short version: compression loses detail predictably and once; rewrite cycles lose it cumulatively and silently, with no floor.
Compression is a bounded, lawful operation. When LLMs compress, they trade fine-grained distinctions for broad category structure in a way that follows rate-distortion logic — they keep the gist and shed the nuance, and the loss is roughly predictable from how aggressively you compress Do LLMs compress concepts more aggressively than humans do?. This is even the source of their strength: text-trained models work as task-specific compressors that beat specialized tools precisely because compression and generalization are the same operation Can text-trained models compress images better than specialized tools?. You can also compress deliberately while protecting the long tail — a small parametric decoder can absorb retrieval knowledge and still preserve rare facts Can retrieval knowledge compress into a tiny parametric model?. The defining feature is that one compression step has a known distortion budget.
Rewrite cycles behave nothing like this. Across long delegated workflows, frontier models silently corrupt roughly a quarter of document content over repeated round-trips, and — crucially — the errors compound without plateauing through 50 passes Do frontier LLMs silently corrupt documents in long workflows?. Each rewrite treats the previous (already-drifted) output as ground truth, so small perturbations stack multiplicatively instead of settling at a stable lossy floor. Compression has a distortion budget; iterated rewriting has compound interest.
Domain-specific details are the first casualties of that compounding, and here's the part you might not expect: it's not that specialized facts are inherently fragile — it's that the model loses the *signal that tells it a detail mattered*. Over-specialized models fail at domain boundaries not gradually but as a cliff, because specialization strips out the calibration signals needed to flag uncertainty Why do specialized models fail outside their domain?. In a rewrite chain, a technical term or edge-case caveat is exactly the kind of low-frequency content a confident model paraphrases away without flagging that anything was lost. Compression at least discards detail in a way correlated with its statistical weight; rewriting discards it wherever the model is overconfident, which is unpredictable and undetectable from the output alone.
The corpus also points at the fix, which sharpens the distinction. The ACE framework argues you should *never* do full rewrites of evolving content — instead use generation-reflection-curation loops that make incremental, additive updates, precisely because full rewrites cause 'brevity bias' and context collapse where detail erodes Can context playbooks prevent knowledge loss during iteration?. That's the tell: the danger isn't compression per se, it's the *rewrite* — regenerating the whole thing from scratch each cycle, where every pass is a fresh opportunity to drop what the model doesn't recognize as important.
Sources 6 notes
Using Rate-Distortion Theory on cognitive datasets, LLMs capture broad category structure but lose fine-grained distinctions humans preserve. LLMs maximize compression efficiency; humans trade compression for contextual meaning that enables situated action.
Chinchilla models trained exclusively on text achieve better compression rates on images and audio than FLAC and PNG by using their context window to adapt as task-specific compressors. This demonstrates that generalization operates through compression, not specialization.
Memory Decoder successfully compresses kNN-LM retrieval distributions into a small transformer that plugs into any LLM via output interpolation. It preserves long-tail factual knowledge while maintaining semantic coherence, reducing perplexity by 6.17 points across domains.
Testing 19 models across 52 domains shows even advanced systems degrade documents by ~25% over extended relay tasks, with errors compounding silently without plateauing through 50 round-trips.
Models optimized for single domains perform exceptionally in-domain but generate confidently incorrect responses outside their scope. This occurs because specialization removes the calibration signals needed to flag uncertainty, making the performance drop abrupt rather than gradual.
The ACE framework treats contexts as evolving playbooks using generation-reflection-curation loops rather than full rewrites. This prevents knowledge loss from compression and detail erosion, achieving +10.6% on agentic tasks and +8.6% on finance without labeled supervision.