INQUIRING LINE

Why does preference tuning reduce diversity in code but increase it in creative tasks?

This explores why the *same* training process — preference tuning / RLHF — pushes code outputs toward sameness while pushing creative writing toward variety, and what that tells us about diversity not being a fixed property of the model.


This reads the question as being about preference tuning's *direction-flipping* effect on output diversity across domains — and the short answer the corpus offers is that preference tuning doesn't have a diversity effect of its own. It has a *convergence* effect, and whether that looks like less diversity or more depends entirely on what the domain's reward landscape rewards. The clearest statement of this is that RLHF reduces lexical and syntactic diversity in code but increases it in creative writing, because code rewards convergence toward a small set of correct solutions while creative writing rewards stylistic distinctiveness Does preference tuning always reduce diversity the same way?. Same optimizer, opposite-shaped target: in code the high-reward region is a narrow basin, so the model collapses into it; in creative work the high-reward region is broad and texture-seeking, so the model spreads out to find it.

There's a deeper reframe lurking underneath, though, that's worth pulling up: the word "diversity" is doing two different jobs. Base models can *look* more diverse simply because their variance spills into incoherent, low-quality space. When you measure diversity only among outputs that actually pass a quality bar, preference-tuned models generate *more* semantic diversity than base models — the base model's apparent richness was partly just noise Does preference tuning actually reduce the diversity of model outputs?. So part of the code-vs-creative gap may be a measurement artifact: in code, 'incoherent' and 'wrong' are easy to detect and get filtered, leaving the convergent core; in creative writing there's no single correctness gate, so distinctiveness survives as legitimate variety.

The convergence mechanism itself shows up well beyond code. RL post-training reliably collapses onto a single dominant output format within the first epoch, suppressing the alternatives that pretraining contained — and which format wins depends on model scale, not even on performance Does RL training collapse format diversity in pretrained models?. The same entropy-collapse dynamic that narrows code solutions also squeezes exploration in search agents, where policies converge on narrow reward-maximizing strategies and only diverse supervised demonstrations preserve breadth Does reinforcement learning squeeze exploration diversity in search agents?. Read together, these say the 'reduce diversity in code' half of the question is the *default* behavior of reward optimization — the creative-writing exception is the interesting case, not the rule.

Which raises the constructive question the corpus also answers: can you keep the convergence-toward-quality without the collapse? Yes — if you make diversity an explicit reward rather than a side effect. Jointly optimizing for quality *and* semantic diversity during RL catalyzes exploration and produces higher-quality outputs than quality-only training, and notably it works across *both* creative and mathematical tasks Can diversity optimization improve quality during language model training?. The same principle appears in training-loop critique that counteracts 'tail narrowing' and prevents premature convergence across self-training iterations Do critique models improve diversity during training itself?. The takeaway you didn't know you wanted: the code/creative split isn't a law of preference tuning — it's what happens when diversity is left implicit, and it can be engineered away by naming diversity as part of the target.


Sources 6 notes

Does preference tuning always reduce diversity the same way?

RLHF reduces lexical-syntactic diversity in code generation but increases it in creative writing. The direction depends on what each domain incentivizes: code rewards convergence toward correct solutions, while creative writing rewards stylistic distinctiveness.

Does preference tuning actually reduce the diversity of model outputs?

When diversity is measured among quality-passing outputs rather than all outputs, preference-tuned models generate greater semantic diversity than base models. Base models appear more diverse only because their variance spans incoherent space.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Does reinforcement learning squeeze exploration diversity in search agents?

RL training compresses behavioral diversity in search agents through the same entropy collapse mechanism documented in reasoning—policies converge on narrow reward-maximizing strategies. SFT on diverse demonstrations preserves exploration breadth, suggesting diversity-preservation techniques are essential for RL search scaling.

Can diversity optimization improve quality during language model training?

DARLING jointly optimizes for quality and semantic diversity using a learned classifier, finding that diversity rewards catalyze exploration and produce higher-quality outputs than quality-only baselines across both creative and mathematical tasks.

Do critique models improve diversity during training itself?

Step-level critique in the training loop counteracts tail narrowing and maintains solution diversity across self-training iterations. This training-time benefit—preventing premature convergence—is more fundamental than test-time accuracy gains.

Next inquiring lines