SYNTHESIS NOTE
Model Architecture and Internals Training, RL, and Test-Time Scaling

Can models learn working memory by attending to their own latents?

Can a feedback loop letting transformers attend to their own internal representations enable them to process indefinitely long sequences without adding extra weights? This explores whether working memory can emerge from self-attention rather than external modules.

Synthesis note · 2026-06-03 · sourced from LLM Architecture

Transformers' quadratic attention caps how much they can process at once, and they suffer "anterograde amnesia" — vast long-term memory in weights, but short-term memory bounded by the attention window. TransformerFAM (Feedback Attention Memory) adds a feedback loop that lets the network attend to its own latent representations, fostering the emergence of working memory and enabling processing of indefinitely long sequences. Two practical virtues: it requires no additional weights (so it integrates seamlessly with pretrained models), and it improves long-context performance across 1B, 8B, and 24B scales.

The keeper is the reframing of memory as feedback over the model's own latents rather than a bolted-on external store — working memory emerges from the architecture attending to itself, and because it adds no weights, existing models can be retrofitted.

This sits in the vault's long-context/memory cluster as a weight-free, feedback-based route. It complements Can neural memory modules scale language models beyond attention limits? (Titans adds a memory module) and Can recurrent memory scale where attention fails on ultra-long text? (recurrent state), and it shares the attend-to-own-latents mechanism with looped/recurrent architectures like Can reasoning happen in latent space during pretraining?.

Inquiring lines that use this note as a source 12

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 128 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

feedback attention to a model's own latents fosters working memory for unbounded sequences without extra weights