When do language models stop memorizing and start generalizing?
Can we measure the exact capacity limit where models transition from memorizing training data to learning underlying patterns? Understanding this boundary could reshape how we think about model learning and privacy.
The standard approach to measuring memorization — attempting to extract training data from the model — is fundamentally flawed. Language models can be coerced to output almost any string, so generation is not proof of memorization. Conversely, a model may memorize patterns (every other token, structural regularities) without reproducing text verbatim. Extraction is neither necessary nor sufficient.
The formal separation: unintended memorization is the information a model contains about a specific dataset (the bits that would change if a particular example were removed from training). Generalization is the information the model contains about the true data-generation process. By isolating and eliminating the generalization component, total memorization becomes measurable.
The key empirical finding: GPT-family models have an approximate capacity of 3.6 bits-per-parameter for unintended memorization. Models memorize training data until this capacity fills. At that point, a phase transition occurs — grokking begins, and unintended memorization decreases as models begin to generalize.
This reframes the grokking phenomenon mechanistically. Since What happens inside models when they suddenly generalize?, the capacity-filling measurement adds the trigger condition: grokking doesn't begin at an arbitrary training step — it begins when memorization saturates. The three phases are downstream of a capacity constraint, not of training duration per se.
The practical implication: memorization capacity is a measurable property of a specific model, not a property of the training algorithm. Two models trained by the same algorithm on the same data can have different memorization properties. This matters for privacy (which models leak more), for understanding generalization (capacity constrains when it begins), and for the Can AI pass every test while understanding nothing? question — a model that appears to generalize may simply have unfilled memorization capacity.
Source: Memory
Related concepts in this collection
-
What happens inside models when they suddenly generalize?
Grokking appears as an abrupt shift from memorization to generalization. But is the underlying process truly discontinuous, or does mechanistic analysis reveal continuous phases we can measure and predict?
capacity-filling provides the trigger mechanism for when grokking begins
-
Can we predict keyword priming before learning happens?
Exploring whether the degree to which newly learned keywords contaminate unrelated contexts can be predicted from measurable properties before training begins, and what mechanisms enable this prediction.
a complementary view of how memorization interacts with learning
-
Can we prune training data without hurting model performance?
This explores whether difficulty metrics can identify redundant training examples that can be safely removed. It matters because most datasets contain massive waste — if we can find which examples are truly necessary, we could train better models on far less data.
if memorization has finite capacity, pruning removes low-value items that consume capacity
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
llm memorization formally separates into unintended memorization and generalization — 3.6 bits-per-parameter capacity fills before grokking begins