Do models with unfilled memorization capacity appear to generalize falsely?

This explores a sharp claim from memorization-capacity research — that until a model's fixed memorization budget fills up, what looks like generalization is really memorized pattern-matching wearing a reasoning costume.

This explores whether a model that still has room to memorize will fake generalization rather than actually do it — and the corpus has a surprisingly crisp answer to a question you might not have known had one. The key finding is that memorization isn't unlimited: GPT-family models hold roughly 3.6 bits per parameter, and that capacity is a property of the model, not the training recipe When do language models stop memorizing and start generalizing?. Only when that budget fills does a phase transition — "grokking" — flip the model from storing examples to genuinely generalizing. The implication runs the other direction of your question: a model with unfilled capacity hasn't started generalizing yet, so when it looks like it's reasoning, it's often still leaning on stored answers.

What does that false generalization look like in practice? The clearest illustration is attestation bias: ask a model whether a premise entails a hypothesis, and it answers based on whether the hypothesis simply appeared in its training data — not on whether the premise actually supports it. Feed it a random, irrelevant premise and it still says "entails" as long as the hypothesis is familiar Do LLMs predict entailment based on what they memorized?. That is generalization theater: the logical-inference behavior is real on the surface and hollow underneath. The same shape shows up when strong training priors simply override what's written in the prompt — the model ignores its own context because memorized associations dominate Why do language models ignore information in their context?.

The corpus also localizes where this leakage happens inside a chain of reasoning. The STIM analysis finds that token-level memorization has three sources, and "local" memorization — predicting the next token from the immediately preceding ones — accounts for up to 67% of reasoning errors, growing worse as problems get harder and drift from the training distribution Where do memorization errors arise in chain-of-thought reasoning?. So even mid-reasoning, the model is quietly substituting recall for inference exactly where the task is hardest.

The flip side sharpens the picture: real generalization has a different signature. Tracing five million pretraining documents shows that reasoning draws on broad, transferable *procedural* knowledge spread across many sources, whereas factual recall depends on narrow, document-specific memorization of the exact target fact Does procedural knowledge drive reasoning more than factual retrieval?. Genuine reasoning is diffuse; faked reasoning is a lookup. And there's an optimistic coda — the capacity that hasn't been "filled" may instead be latent and merely dormant: a single training example in RLVR can jump math accuracy from 36% to 73.6% and keep improving long after training accuracy saturates Can a single training example unlock mathematical reasoning?. So unfilled capacity isn't only a liability that breeds false generalization — sometimes it's unexpressed ability waiting for the right activation signal.

The thing worth taking away: "does it generalize?" and "does it look like it generalizes?" are genuinely different questions, and the corpus gives you concrete fingerprints — attestation bias, prior-over-context override, local token memorization — to tell counterfeit reasoning from the real thing.

Sources 6 notes

When do language models stop memorizing and start generalizing?

GPT-family models have a measurable memorization capacity of approximately 3.6 bits-per-parameter. When this capacity fills, a phase transition triggers grokking—the shift from memorization to genuine generalization. This capacity is a property of individual models, not training algorithms.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Where do memorization errors arise in chain-of-thought reasoning?

STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Can a single training example unlock mathematical reasoning?

A single example in RLVR boosts math performance from 36% to 73.6% and enables test accuracy to improve for 1,400 steps after training accuracy reaches 100%, revealing that minimal activation signals unlock latent reasoning capability.

Do models with unfilled memorization capacity appear to generalize falsely?

Sources 6 notes

Next inquiring lines