Does preference tuning always reduce diversity the same way?
Explores whether the standard narrative that RLHF reduces model diversity holds equally across different task domains, or if the effect varies by what the domain rewards.
A clean finding from Evaluating the Diversity and Quality of LLM Generated Content that the standard "RLHF reduces diversity" narrative cannot accommodate: the direction of the effect depends on the domain. In programming tasks, preference tuning consistently reduces lexical and syntactic diversity while preserving semantic diversity. In open-ended creative writing, preference tuning increases lexical and syntactic diversity, including stylistic variety.
The pattern makes sense in retrospect. Code has a sharp, narrow definition of "correct" — semantically equivalent programs converge on a small set of valid syntactic forms. Preference tuning pushes models toward correctness, which in code means pushing toward a smaller surface lexicon. Creative writing has the opposite property: "good" creative writing rewards distinctive word choice, varied sentence structure, stylistic range. Preference tuning pushes models toward those rewards, which manifests as broader lexical and syntactic variety.
This breaks the assumption that diversity is a single property of the model. A model that has been preference-tuned is not "less diverse" in the absolute sense — it is differently shaped depending on what the domain rewards. For code-heavy applications, the lexical compression is a feature (consistent style) or a bug (less exploration of solution space) depending on what you want. For creative applications, the lexical expansion is a clear win.
The implication for evaluation is that benchmarks that measure diversity in a domain-agnostic way will report misleading aggregate numbers. A model that scores 60th percentile on "creative writing diversity" and 90th percentile on "code diversity" averages to a middling number that hides both ends of the actual capability distribution. Domain-stratified diversity evaluation is necessary to characterize what preference tuning has done to a model.
For builders, this dissolves part of the "should we preference-tune for creativity?" debate. The answer depends on whether the desired creativity is the convergent kind (programs that work) or the divergent kind (stories that distinguish themselves) — and on those terms, preference tuning is well-aligned with the second.
Related concepts in this collection
-
Does preference tuning actually reduce the diversity of model outputs?
The field assumes RLHF and DPO reduce diversity, but this assumption rests on measuring all outputs equally. What happens if we only count diverse outputs that meet quality thresholds?
same paper, the broader metric reframing this finding falls under
-
Why aren't bigger models better for generating diverse outputs?
When generating many unique outputs within a fixed budget, does model size actually matter? Exploring whether the conventional wisdom of using larger models holds for diversity-focused tasks.
same paper, the parameter-efficiency dimension
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
preference tuning diversity effects are domain-dependent — RLHF reduces lexical-syntactic diversity in code while increasing it in creative writing