INQUIRING LINE

Can fixed-size latent states losslessly store arbitrary input context?

This explores whether a compressed, fixed-width memory (the kind state-space models and recurrent architectures carry forward) can hold everything in a long input without losing information — and what the corpus says about the limits of that ambition.


This is really asking whether a fixed-size latent state — the running summary that recurrent and state-space models carry instead of attending back over every past token — can preserve arbitrary input perfectly. The short corpus answer is no, and the cleanest result makes it a provable no: two-layer transformers can copy exponentially long strings precisely because attention lets them look back at the raw sequence, while state-space models are fundamentally bounded by the capacity of their fixed-size latent state and degrade at copying and retrieval in both toy and pretrained settings Can state-space models match transformers at copying and retrieval?. Once the input exceeds what the state can encode, something has to be thrown away — losslessness is impossible by construction.

The more interesting move in the collection is to question whether lossless storage is even the right target. Several lines treat compression as the feature, not the bug. ReadAgent deliberately compresses documents into lossy "gist memories" before it knows the task, then re-reads the original passages only when a detail is needed — and this lossy-plus-lookup design extends effective context 3–20× and beats retrieval baselines Can LLMs read long documents like humans do?. An external, RL-trained context manager pushes the same idea further: the optimal amount of compression isn't "none," it's whatever matches the downstream agent's reliability — strong agents get high-fidelity context, weaker ones need aggressive pruning to stay coherent Can external managers compress context better than frozen agents?. So the design frontier isn't lossless state; it's *intelligently lossy* state.

A second reframing says the bottleneck isn't storage capacity at all. One line argues the real cost of long context is the compute needed to consolidate evicted tokens into the model's internal fast weights — an offline "sleep" phase whose quality scales with how many consolidation passes you spend, like test-time scaling Is long-context bottleneck really about memory or compute?. On that view, a fixed-size state can hold a great deal *if* you pay enough compute to fold context into it well; what you can't do is fold it in for free. The Titans architecture splits the difference architecturally — keeping quadratic attention for short-term precision while routing surprising tokens into a separate compressed long-term memory module, reaching 2M+ tokens without the quadratic penalty Can neural memory modules scale language models beyond attention limits?. The implicit admission: a single uniform state can't do it, so you partition memory by what deserves exact recall.

What the reader probably didn't expect is how deep the "no" goes. The limit isn't only about copying tokens — it echoes a broader formal result that any computable LLM must fail on infinitely many inputs no matter its architecture, because finite machinery can't perfectly cover an unbounded input space Can any computable LLM truly avoid hallucinating?. Fixed-size lossless storage is a special case of that same finiteness wall. The corpus's constructive answer, then, is to stop chasing perfect recall and instead engineer *which* losses you can tolerate: gist-and-retrieve, surprise-gated memory, reliability-matched pruning, and compute-for-consolidation are all ways of being smart about the information you've decided to drop. Latent-state methods that scale a separate latent dimension independent of parameters Can latent thought vectors scale language models beyond parameters? grow the budget — but growing it is not the same as making it infinite.


Sources 7 notes

Can state-space models match transformers at copying and retrieval?

Two-layer transformers can copy exponentially long strings while state-space models are fundamentally limited by their fixed-size latent state. Empirically, transformers dramatically outperform SSMs at copying and context retrieval in both synthetic and pretrained settings.

Can LLMs read long documents like humans do?

ReadAgent compresses documents into gist memories before knowing the task, then retrieves details only when needed, extending effective context 3–20× and outperforming retrieval baselines on long-document QA.

Can external managers compress context better than frozen agents?

An external RL-trained manager can adaptively prune context for frozen agents, with the key insight that stronger agents benefit from high-fidelity preservation while weaker agents need aggressive compression to stay reliable.

Is long-context bottleneck really about memory or compute?

Research shows the bottleneck is not memory capacity but the compute required to consolidate evicted context into fast weights during offline sleep phases. Performance improves with more consolidation passes, following a test-time scaling pattern on harder reasoning tasks.

Can neural memory modules scale language models beyond attention limits?

Titans architecture separates attention (short-term, quadratic) from neural memory (long-term, compressed), prioritizing surprising tokens for storage. The model outperforms standard Transformers and linear RNNs across tasks while scaling to 2M+ token contexts without quadratic penalties.

Can any computable LLM truly avoid hallucinating?

Three formal theorems prove that any computable LLM must hallucinate on infinitely many inputs, and internal mechanisms like self-correction cannot eliminate this mathematical constraint. External safeguards are therefore necessary, not optional.

Can latent thought vectors scale language models beyond parameters?

Latent-Thought Language Models achieve superior sample and parameter efficiency by coupling fast local variational learning with slow global decoder learning. This dual-rate scheme scales few-shot reasoning across both model and latent size, creating independent scaling dimensions beyond traditional parameter scaling.

Next inquiring lines