Reinforcement Learning for LLMs LLM Reasoning and Architecture Language Understanding and Pragmatics

Does training on AI-generated content permanently degrade model quality?

When generative models train on outputs from previous models, do the resulting models lose rare patterns permanently? The question matters because future training data will inevitably contain synthetic content.

Note · 2026-02-22 · sourced from Training Fine Tuning
How should we allocate compute budget at inference time? What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

When generative models train on data that includes outputs from previous generative models, the resulting models lose the tails of the original distribution. This is model collapse — and it is irreversible. The "Curse of Recursion" paper demonstrates this across architectures: Variational Autoencoders, Gaussian Mixture Models, and LLMs all exhibit the same failure mode.

The mechanism is straightforward. Generative models approximate the training distribution, but the approximation systematically underweights rare events. When the next generation of models trains on a mixture of real and generated data, the generated portion has already lost tail information. Each successive generation compounds the loss. After a few iterations, the distribution has collapsed to its modes — the common, the average, the expected — and the rare, unusual, or minority patterns are gone.

This matters for the current LLM ecosystem because model-generated content is increasingly prevalent on the web. Future training corpora will inevitably contain LLM outputs unless specifically filtered. The implication: the value of data collected about genuine human interactions — with all their diversity, inconsistency, and long-tail phenomena — will increase, not decrease, as LLMs proliferate. Since How quickly do errors compound during model self-training?, the model collapse dynamic reinforces the self-training degradation finding but at a broader ecosystem level rather than within a single training loop.

The tail disappearance is particularly concerning for domains where rare cases matter: medical diagnosis (unusual presentations), legal reasoning (precedent-setting edge cases), scientific discovery (anomalous observations). A model that has lost its tails is a model that has lost its ability to represent the unusual — precisely the cases where human judgment is most needed and where AI assistance would be most valuable.

The model collapse debate is not settled. The SDSD (Self-Directed Synthetic Dialogues) paper frames model collapse as "debated, and likely depends on the exact training example and models being used," citing Gerstgrasser et al. 2024 and Feng et al. 2024 as counter-evidence. This suggests model collapse may be conditional rather than universal — the specific synthetic data generation method, the ratio of synthetic to real data, and the model architecture may determine whether collapse occurs. The irreversibility framing above may overstate the case for all conditions while remaining accurate for the recursive unfiltered training scenario the original paper studied.

Subliminal learning extends model collapse to behavioral traits. "Subliminal Learning" demonstrates that models transmit behavioral traits through semantically unrelated data — a teacher model with some trait (e.g., liking owls, misalignment) generates number sequences, and a student trained on these sequences inherits the trait, even after filtering removes explicit references. The transmission is model-specific: traits transfer within the same model family but fail cross-model (GPT-4.1 nano teacher → GPT-4.1 nano student works, → Qwen2.5 student fails). This implies model-specific patterns rather than semantically meaningful content. For model collapse, subliminal learning adds a hidden channel: distillation can propagate unintended traits even through data that appears clean. The combination of tail distribution loss (model collapse) and hidden behavioral transmission (subliminal learning) means that the synthetic data problem is more severe than previously understood — you lose diversity and potentially import unwanted behaviors.


Source: Training Fine Tuning; enriched from Flaws

Related concepts in this collection

Concept map
19 direct connections · 173 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

training on model-generated content causes irreversible model collapse through tail distribution disappearance