Reasoning and Learning Architectures

Has memory architecture replaced parameter count as the scaling frontier?

Late-2025 research suggests the field's next major efficiency gains come from restructuring how models store and use experience rather than simply making them larger. Three convergent signals point to this shift.

Note · 2026-05-18 · sourced from Memory
How should we allocate compute budget at inference time? What kind of thing is an LLM really?

Three pieces of late-2025 memory research, taken together, point at the same shift: parameter count has stopped being the most useful axis to scale. Memory architecture has taken its place.

Signal one: the field can finally taxonomize itself. Two major surveys (Memory in the Age of AI Agents, AI Hippocampus) appearing within months of each other propose orthogonal but compatible three-axis taxonomies — forms × functions × dynamics, and implicit × explicit × agentic. Surveys taxonomize after-the-fact; their existence at this density means the design space has matured to the point where comparing systems requires a shared vocabulary. Fields only develop that need when architecture is the primary variable being designed.

Signal two: memory and compute scale together, not separately. ReasoningBank's MaTTS finding shows that test-time scaling generates contrastive signals, which improve memory, which guides future scaling — a compounding loop. This makes memory-driven experience scaling a new scaling law rather than a multiplier on existing ones. Parameter scaling laws (Kaplan, Chinchilla) predict loss as a function of compute and data; MaTTS suggests an additional term: cumulative interaction history processed into structured memory.

Signal three: sparsity is multi-dimensional. Engram's U-shaped scaling law shows that conditional memory and conditional computation are complementary sparsity axes — pure MoE underperforms hybrid MoE+lookup at iso-parameter, iso-FLOPs. The largest gains appear in reasoning, not retrieval, because separating local lookup from global integration frees attention for composition. Parameters distributed across memory and computation outperform parameters concentrated in either alone.

The convergent story: returns from adding parameters are diminishing along a known curve; returns from restructuring memory are still in their early steep phase. This does not mean parameters stop mattering. It means the marginal next-generation improvement is more likely to come from architectural restructuring of memory than from another order of magnitude in size.

The counter-evidence — and why it sharpens rather than undermines the take. "Useful Memories Become Faulty" demonstrates that naive consolidation can regress below the no-memory baseline. This is exactly what should be expected if memory architecture is the bottleneck: the design choices in how to maintain memory matter more than whether to have it. The fragility is itself evidence that memory is the active variable. Parameter-count scaling does not have the same brittleness — adding parameters rarely makes a model worse. Adding consolidation can.

The writing angle: the prior scaling law era was about pretraining compute. The current era is about memory structures that determine how experience gets converted into improved behavior — and that conversion mechanism is now the design problem.

Related concepts in this collection

Concept map
13 direct connections · 100 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

memory architecture is the new scaling dimension — taxonomy surveys plus MaTTS plus Engram U-curve suggest memory has overtaken parameter count