Can zero-weight drift through external memory replace parameter plasticity entirely?

This explores whether agents can keep learning entirely through external memory — no weight updates at all — and whether that can fully replace the older idea of learning by changing the model's parameters.

This explores whether 'learning' can move entirely out of a model's weights and into external memory — and whether that swap is total, or whether some learning still has to happen in the parameters. The corpus leans surprisingly far toward 'yes, memory can carry most of the load,' but it also quietly marks where the substitution stops being clean.

The strongest evidence for substitution comes from agents that improve without ever touching their weights. AgentFly reframes the whole learning loop as memory operations — credit assignment and policy improvement happen in case, subtask, and tool memory, and it still hits competitive benchmark scores with a frozen model Can agents learn continuously from experience without updating weights?. Reflexion shows the same trick on failure: an agent writes a verbal self-diagnosis into episodic memory and does better next episode, no gradient step required Can agents learn from failure without updating their weights?. VOYAGER pushes it further into lifelong territory, storing executable skills in a searchable library and composing new ones from old — sidestepping the catastrophic forgetting that weight-update methods suffer Can agents learn new skills without forgetting old ones?. So a real chunk of what we used to call 'plasticity' genuinely relocates outside the network.

But the corpus also says the *shape* of the memory matters more than people assume — which is the first crack in 'replace entirely.' Frozen-model agents using causal-form memory (memory that records when a lesson applies, not just what happened) beat generic reflection by 23 points and transfer better to new environments Can frozen language models continually improve through memory structure alone?. That's a clue that memory isn't a neutral substitute for weights; you have to engineer it carefully to recover what plasticity gave you for free. Some researchers go as far as calling memory architecture the new scaling frontier — arguing returns from restructuring memory now exceed returns from adding parameters Has memory architecture replaced parameter count as the scaling frontier?. And on the architecture side, Titans bakes a learned neural-memory module into the model itself to scale past attention's limits, which blurs the tidy line between 'memory' and 'weights' — its memory *has* parameters that adapt Can neural memory modules scale language models beyond attention limits?.

Here's the thing you might not have expected: even where weights still do the learning, they barely move. RL turns out to update only 5–30% of parameters, in nearly identical sparse subnetworks across random seeds — so plasticity is already far more concentrated and structured than 'retrain the whole net' implies Does reinforcement learning update only a small fraction of parameters?. That reframes the whole question: it's not memory-versus-plasticity as two rival mechanisms, but a spectrum of how much you change inside versus how much you offload outside. Hybrid designs like SoftCoT make the trade explicit — freeze the big model, train a tiny auxiliary one, and you keep pretrained knowledge while still getting new behavior Can continuous reasoning avoid forgetting in instruction-tuned models?.

So, entirely? The corpus says memory can replace weight updates for *acquiring and reusing experience* — that substitution is real and increasingly the default for agents. What it can't replace is the base model's underlying capabilities: every memory-based method here rides on a frozen model that already knows how to read, reason, and act. Memory drift changes what an agent *does* with what it knows; it doesn't expand what it fundamentally *can* know. The honest answer is that external memory replaces plasticity for the outer loop of learning, while a small, structured core of parameter change — and the frozen pretrained substrate beneath it — remains load-bearing.

Sources 8 notes

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can frozen language models continually improve through memory structure alone?

Agents using causal-form memory (preserving applicability conditions) outperform generic reflection by 23 points on repeated trials and gain 4-17 points transferring to new environments, showing memory shape matters more than parameter updates.

Has memory architecture replaced parameter count as the scaling frontier?

Three converging signals in late-2025 research—taxonomy maturation, memory-aware test-time scaling loops, and hybrid sparsity laws—show that returns from restructuring memory now exceed returns from adding parameters. The design bottleneck has shifted from compute to memory structure.

Can neural memory modules scale language models beyond attention limits?

Titans architecture separates attention (short-term, quadratic) from neural memory (long-term, compressed), prioritizing surprising tokens for storage. The model outperforms standard Transformers and linear RNNs across tasks while scaling to 2M+ token contexts without quadratic penalties.

Does reinforcement learning update only a small fraction of parameters?

Across seven RL algorithms and ten LLM families, RL induces intrinsic parameter sparsity of 5–30% without explicit regularization. Critically, these sparse updates are nearly full-rank and nearly identical across random seeds, indicating structural rather than arbitrary parameter selection.

Can continuous reasoning avoid forgetting in instruction-tuned models?

SoftCoT avoids catastrophic forgetting by keeping the main LLM frozen while delegating soft thought generation to a small auxiliary model. This architectural separation maintains pre-trained knowledge while enabling continuous reasoning.

Can zero-weight drift through external memory replace parameter plasticity entirely?

Sources 8 notes

Next inquiring lines