Reasoning and Learning Architectures

Can recurrence consolidate memory without predicting tokens?

Recurrent neural networks typically use recurrence only for prediction. But could offline recurrent passes serve a second purpose—consolidating transient context into persistent weights, like sleep does in brains?

Note · 2026-05-28 · sourced from Novel Architectures

Recurrence in sequence models is almost always in service of prediction: each step consumes a token and emits a hidden state used to predict the next token. "Language Models Need Sleep" identifies a second, under-used role — recurrence as a consolidation mechanism. During the model's sleep phase, it performs forward passes over the accumulated context while receiving no new input tokens, and uses those passes to recursively update its fast weights via a learned local rule. The recurrence is not predicting anything; it is rewriting persistent state.

The biological framing is doing real conceptual work, not decoration. In animals, hippocampal replay during sleep reactivates short-term memories and consolidates them into cortical synaptic weights, with no external input during the phase. The architecture mirrors this precisely: full context window → sleep with no input tokens → multiple passes that move context-window memory into persistent weights → clear context → resume. The claim "recurrence can be used not only for prediction but also for memory consolidation" is the load-bearing insight, and the replay analogy specifies what the offline passes are for.

This matters because it separates two functions that recurrent architectures conflate. Prediction maps input to output; consolidation maps transient state to durable state. Recognizing them as distinct lets a system schedule them differently — predict at wake time under latency pressure, consolidate at sleep time under a compute budget. The move parallels Complementary Learning Systems theory's account of why brains need a fast-encoding and a slow-consolidating subsystem. It is precisely the transfer mechanism the vault's CLS-analogy note flags as missing from most AI memory systems: a way to move repeated short-term content into the slow-learning substrate. Counterpoint: a learned local update rule on fast weights is a lossy, parameterized consolidation — it is not guaranteed to preserve what later queries need, so the consolidation quality is itself a failure surface. Why it matters: it gives the field a concrete computational primitive for the long-missing sleep-consolidation step.


— "Language Models Need Sleep", https://arxiv.org/abs/2605.26099

Related concepts in this collection

Concept map
13 direct connections · 82 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

recurrence can serve memory consolidation not only prediction