Reinforcement Learning for LLMs

How do quality, diversity, and complexity affect synthetic data differently?

When training models on synthetic data, do quality, diversity, and complexity each play distinct roles in how well models generalize? Understanding their separate effects could explain why current optimization strategies fail.

Note · 2026-05-03 · sourced from Data

Synthetic data generation methods proliferated rapidly but produced few directly comparable studies, because every method varied seeds, prompts, filters, and tasks simultaneously. The QDC framework proposes a cleaner basis for comparison: examine the quality, diversity, and complexity of resulting synthetic data, and trace how each characteristic maps to downstream model performance.

Three findings disentangle effects that previous work conflated. Quality is essential for in-distribution generalization — models learn to produce acceptable outputs only when training samples meet specification fidelity. Diversity is essential for out-of-distribution generalization — without sufficient variety in training, the model has no basis for handling distribution shifts. Complexity is beneficial for both, because complex examples push the model's representational capacity rather than merely confirming existing capability.

A critical structural observation follows: there is a Quality-Diversity trade-off in training data. Maximizing quality by tightening rejection criteria narrows the distribution. Maximizing diversity broadens the distribution but admits more low-fidelity samples. The trade-off is irreducible at the level of any single sample — a sample cannot simultaneously be maximally diverse from the typical case and maximally compliant with the typical specification.

The most consequential implication is for self-improvement. Models are typically evaluated and optimized only for output quality. This quality-only training narrows output diversity, which then becomes the synthetic data for the next training round, which has even less diversity, and so on. Self-improvement degrades because the data generator collapses toward the model's existing distribution — the model collapse mechanism in slow motion. Balancing QDC is therefore not a polish concern but a structural prerequisite for self-improvement to work — a system that does not preserve diversity cannot bootstrap beyond its current capabilities.


Source: Data

Related concepts in this collection

Concept map
16 direct connections · 141 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

quality diversity and complexity create distinct downstream effects in synthetic training data — and most pipelines optimize only quality which constrains self-improvement