Has memory architecture replaced parameter count as the scaling frontier?
Late-2025 research suggests the field's next major efficiency gains come from restructuring how models store and use experience rather than simply making them larger. Three convergent signals point to this shift.
Three pieces of late-2025 memory research, taken together, point at the same shift: parameter count has stopped being the most useful axis to scale. Memory architecture has taken its place.
Signal one: the field can finally taxonomize itself. Two major surveys (Memory in the Age of AI Agents, AI Hippocampus) appearing within months of each other propose orthogonal but compatible three-axis taxonomies — forms × functions × dynamics, and implicit × explicit × agentic. Surveys taxonomize after-the-fact; their existence at this density means the design space has matured to the point where comparing systems requires a shared vocabulary. Fields only develop that need when architecture is the primary variable being designed.
Signal two: memory and compute scale together, not separately. ReasoningBank's MaTTS finding shows that test-time scaling generates contrastive signals, which improve memory, which guides future scaling — a compounding loop. This makes memory-driven experience scaling a new scaling law rather than a multiplier on existing ones. Parameter scaling laws (Kaplan, Chinchilla) predict loss as a function of compute and data; MaTTS suggests an additional term: cumulative interaction history processed into structured memory.
Signal three: sparsity is multi-dimensional. Engram's U-shaped scaling law shows that conditional memory and conditional computation are complementary sparsity axes — pure MoE underperforms hybrid MoE+lookup at iso-parameter, iso-FLOPs. The largest gains appear in reasoning, not retrieval, because separating local lookup from global integration frees attention for composition. Parameters distributed across memory and computation outperform parameters concentrated in either alone.
The convergent story: returns from adding parameters are diminishing along a known curve; returns from restructuring memory are still in their early steep phase. This does not mean parameters stop mattering. It means the marginal next-generation improvement is more likely to come from architectural restructuring of memory than from another order of magnitude in size.
The counter-evidence — and why it sharpens rather than undermines the take. "Useful Memories Become Faulty" demonstrates that naive consolidation can regress below the no-memory baseline. This is exactly what should be expected if memory architecture is the bottleneck: the design choices in how to maintain memory matter more than whether to have it. The fragility is itself evidence that memory is the active variable. Parameter-count scaling does not have the same brittleness — adding parameters rarely makes a model worse. Adding consolidation can.
The writing angle: the prior scaling law era was about pretraining compute. The current era is about memory structures that determine how experience gets converted into improved behavior — and that conversion mechanism is now the design problem.
Related concepts in this collection
-
Can three axes replace the short-term long-term memory split?
Does breaking agent memory into forms, functions, and dynamics provide a clearer framework than the traditional short-term/long-term distinction? This matters because current agent-memory literature lacks a unified vocabulary, making comparison between systems nearly impossible.
taxonomy signal
-
Can agents learn better from their failures than successes?
Does storing reasoning strategies extracted from both successful and failed experiences improve agent learning compared to tracking only successes or raw trajectories? This matters because failures offer preventative lessons that successes alone cannot teach.
MaTTS as new scaling axis
-
Can lookup memory and computation work together better than either alone?
Mixture-of-Experts handles dynamic logic, but static knowledge might need a different mechanism. Can a hybrid approach combining conditional computation with fast lookup outperform pure sparse models?
Engram U-curve
-
Does agent memory degrade when continuously consolidated?
Can consolidating agent experiences into summaries actually harm long-term performance? Research on ARC-AGI tasks suggests continuous memory updates may reduce capability below the no-memory baseline.
fragility as evidence that memory is the active variable
-
Can recursive subtask trees overcome context window limits?
Explores whether modeling reasoning as prunable trees of subtasks could eliminate the context length constraints that currently force developers into multi-agent architectures. Asks if working memory can become truly unlimited through selective KV cache retention.
architectural memory restructuring for working layer
-
Can neural memory modules scale language models beyond attention limits?
Can separating short-term attention from adaptive long-term memory allow models to efficiently handle context windows exceeding 2M tokens while maintaining competitive performance?
Titans/Miras as memory-architecture shift
-
Is agent memory a storage problem or a connectivity problem?
Most systems treat memory as a repository to store and retrieve. But what if memory's real usefulness depends on how units are linked together rather than what is stored?
extends: connectivity-not-storage specifies which memory design choice the scaling-dimension thesis depends on
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
memory architecture is the new scaling dimension — taxonomy surveys plus MaTTS plus Engram U-curve suggest memory has overtaken parameter count