INQUIRING LINE

Why does semantic deduplication reduce memorization in fine-tuned models?

This explores why removing near-duplicate examples from a fine-tuning set blunts a model's tendency to memorize verbatim — and what that reveals about how memorization gets triggered in the first place.


This explores why semantic deduplication — stripping out examples that say nearly the same thing — reduces a fine-tuned model's tendency to memorize and later leak its training data. The short version from the corpus: memorization is driven less by *seeing* sensitive content and more by seeing it *repeatedly*, and dedup attacks exactly that. Controlled experiments across GPT-2, Phi-3, and Gemma-2 show that fine-tuning on repeated sensitive records pushes privacy leakage from a baseline of 0–5% up to 60–75% — and that semantic dedup, alongside differential privacy and entropy/pattern filtering, can erase that leakage while keeping ~94.7% of the model's utility Does repeated sensitive data in fine-tuning cause memorization?. The mechanism is repetition. Two paraphrases of the same fact aren't redundant to the loss function; they're two gradient nudges toward the same string, and a handful of those is all it takes.

How few? Strikingly few. Work on knowledge priming finds that just three training exposures are enough to lock in an effect, and that whether a piece of content will be memorized is *predictable in advance* from its pre-training keyword probability — there's roughly a 10^-3 threshold separating content that gets primed from content that doesn't Can we predict keyword priming before learning happens?. That reframes what dedup is doing: it isn't scrubbing 'dangerous' text, it's keeping any single semantic item below the repetition count where memorization crosses over from generalization into verbatim recall.

What's surprising is how physically *localized* memorization turns out to be — which is why a blunt data-side intervention can be so effective. When a model memorizes a paragraph, it leaves a fingerprint: larger gradients in the lower layers and a specific low-layer attention head fixating on rare tokens, with the whole recall hinging on a few early-prefix tokens Where does a model store memorized paragraphs?. Memorization is a narrow, targetable circuit, not a diffuse property — so denying it the repeated rare-token co-occurrences it feeds on (which is what semantic dedup does) starves the circuit at the source rather than trying to unlearn it after the fact.

The deeper payoff is seeing memorization as the *default* the model slides toward when generalization is easy to skip. Models tested on entailment turn out to predict 'entailed' based on whether the hypothesis appears in training data rather than whether the premise actually supports it — a memorized-proposition shortcut standing in for reasoning Do LLMs predict entailment based on what they memorized?. And in chain-of-thought, the bulk of reasoning errors trace to *local* memorization keyed off immediately preceding tokens Where do memorization errors arise in chain-of-thought reasoning?. Dedup matters because every duplicated example is an invitation to take that shortcut.

If you want the optimistic counterpoint: memorization and generalization don't have to be enemies competing for the same weights. Wide & Deep architectures train them as separate, cooperating components — the memorization half stays small precisely because the generalization half absorbs the common cases, leaving memorization to handle only the genuinely rare items Can one model memorize and generalize better than two?. Read together, the corpus suggests semantic dedup works for the same reason that architecture works: memorization is cheapest when content repeats, so the most effective lever is controlling repetition rather than fighting the model's recall after it's already formed.


Sources 6 notes

Does repeated sensitive data in fine-tuning cause memorization?

Controlled experiments on GPT-2, Phi-3, and Gemma-2 show fine-tuning with repeated sensitive data increases privacy leakage from baseline 0-5% to 60-75%. Four complementary defenses—semantic dedup, differential privacy, entropy filtering, and pattern filtering—eliminate leakage while preserving 94.7% utility.

Can we predict keyword priming before learning happens?

Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.

Where does a model store memorized paragraphs?

Memorized paragraphs leave a distinctive fingerprint in GPT-Neo: larger gradients in lower layers, concentration in a specific low-layer attention head attending to rare tokens, and dependence on a few early-prefix tokens. This localization makes memorization targetable for unlearning.

Do LLMs predict entailment based on what they memorized?

McKenna et al. (2023) identified attestation bias: LLMs predict entailment based on whether the hypothesis appears in training data, not whether the premise actually supports it. Random premise experiments show models maintain high entailment predictions when hypotheses are attested, proving they respond to memorized propositions rather than premise-hypothesis relationships.

Where do memorization errors arise in chain-of-thought reasoning?

STIM framework identifies local, mid-range, and long-range memorization sources in CoT reasoning. Local memorization—based on preceding tokens—accounts for up to 67% of reasoning errors, especially as complexity increases and distributional shift occurs.

Can one model memorize and generalize better than two?

Wide & Deep models train memorization (cross-product features) and generalization (embeddings) together, allowing each component to specialize: the wide part becomes small because deep handles common cases, and deep doesn't overfit rare items because wide captures them. Ensembling requires both halves full-size.

Next inquiring lines