LLM Reasoning and Architecture Reinforcement Learning for LLMs

When do language models stop memorizing and start generalizing?

Can we measure the exact capacity limit where models transition from memorizing training data to learning underlying patterns? Understanding this boundary could reshape how we think about model learning and privacy.

Note · 2026-02-23 · sourced from Memory
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

The standard approach to measuring memorization — attempting to extract training data from the model — is fundamentally flawed. Language models can be coerced to output almost any string, so generation is not proof of memorization. Conversely, a model may memorize patterns (every other token, structural regularities) without reproducing text verbatim. Extraction is neither necessary nor sufficient.

The formal separation: unintended memorization is the information a model contains about a specific dataset (the bits that would change if a particular example were removed from training). Generalization is the information the model contains about the true data-generation process. By isolating and eliminating the generalization component, total memorization becomes measurable.

The key empirical finding: GPT-family models have an approximate capacity of 3.6 bits-per-parameter for unintended memorization. Models memorize training data until this capacity fills. At that point, a phase transition occurs — grokking begins, and unintended memorization decreases as models begin to generalize.

This reframes the grokking phenomenon mechanistically. Since What happens inside models when they suddenly generalize?, the capacity-filling measurement adds the trigger condition: grokking doesn't begin at an arbitrary training step — it begins when memorization saturates. The three phases are downstream of a capacity constraint, not of training duration per se.

The practical implication: memorization capacity is a measurable property of a specific model, not a property of the training algorithm. Two models trained by the same algorithm on the same data can have different memorization properties. This matters for privacy (which models leak more), for understanding generalization (capacity constrains when it begins), and for the Can AI pass every test while understanding nothing? question — a model that appears to generalize may simply have unfilled memorization capacity.


Source: Memory

Related concepts in this collection

Concept map
12 direct connections · 105 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

llm memorization formally separates into unintended memorization and generalization — 3.6 bits-per-parameter capacity fills before grokking begins