Reasoning and Knowledge Reasoning and Learning Architectures

Can a reasoning model's thinking trace compress context effectively?

Does the raw reasoning trace produced by a thinking model naturally function as a context compressor without specialized training or modules? And how does this compare to dedicated compression methods?

Note · 2026-05-28 · sourced from Context Engineering

Context compression — shortening long inputs with minimal information loss to accelerate inference — has been treated as a separate problem requiring specialized compression modules or compression-specific training. Thinking as Compression (TaC) collapses that separation. Its claim is that thinking is already a process of distilling and compressing information, so a reasoning model's thinking trace can serve directly as the compressed context. The trace focuses on key information, skips redundancies, revisits important evidence, and links scattered facts into a compact form — exactly the operations a compressor performs.

Empirically, simply prompting a thinking model to generate a trace and using that trace as the shortened context already outperforms most representative compression methods, with no dedicated compressor and no compression-specific training. The intrinsic capability was hiding in plain sight: the same mechanism that lets reasoning models solve hard problems also produces a usable compression of their inputs.

The conceptual payoff is a reframing of what a thinking trace is. It is usually read as a means to an answer; TaC reads it as a reusable artifact — a compressed representation of the relevant context that downstream models can consume. This rhymes with the deeper equivalence in the vault: since Can text-trained models compress images better than specialized tools?, modeling and compression are two views of the same operation, and TaC extends that identity from the model's weights to its traces. It also connects to evidence that trace verbosity is largely separable from reasoning content — since Can minimal reasoning chains match full explanations?, the compressive core of a trace is small. Counterpoint that motivates the companion finding: raw traces struggle with budget control and shortcut behaviors, so "the trace is a compressor" needs a control mechanism to be reliable. Why it matters: it eliminates a whole class of dedicated compression machinery by repurposing reasoning the model already does.

— "Thinking as Compression: Your Reasoning Model is Secretly a Context Compressor", https://arxiv.org/abs/2605.28713

Related concepts in this collection

Can text-trained models compress images better than specialized tools? Do general-purpose language models trained only on text outperform domain-specific compressors like PNG and FLAC on their native data? This tests whether compression ability is universal or requires domain specialization.
the foundational modeling-is-compression identity TaC extends to thinking traces
Can minimal reasoning chains match full explanations? Does removing all explanatory text from chain-of-thought reasoning preserve accuracy? This tests whether verbose intermediate steps are necessary for solving problems or just artifacts of how language models are trained.
evidence that a trace's compressive core is small and verbosity is separable
Can we steer reasoning toward brevity without retraining? This explores whether model reasoning style occupies learnable geometric directions in activation space, and whether we can shift toward concise thinking by steering through that space without expensive retraining.
a different route to compact traces, via activation steering rather than prompting the trace as context
Can thinking traces be made reliably budget-controllable? Raw thinking traces compress well but ignore budget targets and take shortcuts. Can reward optimization make them controllable and useful for deployment?
extends: the companion finding this note's counterpoint anticipates — raw traces lack budget control, so TaC-C adds reward training to make "the trace is a compressor" reliable

Concept map

12 direct connections · 113 in 2-hop network ·dense cluster Open in graph ↗

Can a reasoning model's thinking trace compress … Can text-trained models compress images better tha… Can minimal reasoning chains match full explanatio… Can we steer reasoning toward brevity without retr… Can thinking traces be made reliably budget-contro…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Original note title

a reasoning model's thinking trace is itself an effective context compressor requiring no dedicated module

Can a reasoning model's thinking trace compress context effectively?

Related concepts in this collection

Related papers in this collection