Why do preference-tuned models produce different diversity patterns in code versus creative writing?

This explores why preference tuning (RLHF and related) makes code outputs more uniform but creative writing outputs more varied — and what in each domain's reward structure drives the split.

This explores why preference tuning (RLHF and related) pushes code and creative writing in opposite directions on diversity. The cleanest answer in the corpus is that the reward signal points different ways in each domain: code generation rewards convergence toward a correct, canonical solution, so tuning narrows lexical and syntactic variety; creative writing rewards stylistic distinctiveness, so tuning widens it Does preference tuning always reduce diversity the same way?. Diversity isn't something RLHF uniformly destroys or preserves — it follows whatever each domain incentivizes.

The deeper mechanism shows up when you look at what RL post-training does to a model's output distribution. RL tends to amplify a single dominant format inherited from pretraining while suppressing the alternatives, often within the first epoch — and the winning format is selected by model scale, not necessarily by which one performs best Does RL training collapse format diversity in pretrained models?. In code, where there's a sharp notion of "correct," that collapse onto one format reads as helpful convergence. In open-ended writing, the same collapse would be a loss — which is part of why the domains diverge under identical training procedures.

There's a twist, though: more diversity in creative writing isn't automatically a good thing. Newer models actually diverge further from human lexical patterns even as they become harder to distinguish from human text, because RLHF optimizes for quality ratings rather than human-like writing Why do newer AI models diverge further from human writing patterns?. And when you compare models against each other rather than within one, the picture inverts again — dozens of independently trained models converge on near-identical responses to open-ended prompts, an "Artificial Hivemind" driven by overlapping training data and shared alignment recipes Do different AI models actually produce diverse outputs?. So creative-writing diversity can rise within a model while collapsing across models.

The entanglement runs deeper than style. In writing assistance, the very preference optimization that produces polish also produces persona distortion — writers prefer the AI rewrite 63% of the time yet object to how it warps their voice, because polish and distortion are coupled at the model level and can't be cleanly separated Can user preference guide AI writing tool alignment?. That's the creative-writing analogue of the code problem: optimizing toward what raters click pulls the distribution somewhere the user didn't actually ask for.

If you want to chase the diversity-collapse thread further, two adjacent angles help: critique-in-the-loop training preserves solution diversity by preventing premature convergence during self-training Do critique models improve diversity during training itself?, and there's a case that creative output needs reasoning modes — combinational, exploratory, transformational — that conventional methods ignore entirely, which may be why ideation diversity collapses where code diversity merely tightens Can LLMs reason creatively beyond conventional problem-solving?.

Sources 7 notes

Does preference tuning always reduce diversity the same way?

RLHF reduces lexical-syntactic diversity in code generation but increases it in creative writing. The direction depends on what each domain incentivizes: code rewards convergence toward correct solutions, while creative writing rewards stylistic distinctiveness.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Why do newer AI models diverge further from human writing patterns?

ChatGPT-4.5 and o4-mini show greater lexical diversity differences from human text than earlier models, yet human judges cannot reliably distinguish them. Training objectives like RLHF appear to optimize for quality ratings rather than human-like writing patterns.

Do different AI models actually produce diverse outputs?

INFINITY-CHAT analyzed 70+ models across 26K open-ended queries and found an "Artificial Hivemind" effect: models independently generate strikingly similar or identical responses due to overlapping training data and alignment procedures, undermining the diversity benefits of model ensembles.

Can user preference guide AI writing tool alignment?

Writers prefer AI rewrites 63% of the time but object to systematic persona distortions those same rewrites introduce. Mitigation studies show polish and distortion are entangled at the model level—preference optimization produces both simultaneously.

Do critique models improve diversity during training itself?

Step-level critique in the training loop counteracts tail narrowing and maintains solution diversity across self-training iterations. This training-time benefit—preventing premature convergence—is more fundamental than test-time accuracy gains.

Can LLMs reason creatively beyond conventional problem-solving?

Research identifies combinational, exploratory, and transformational reasoning as distinct creative modes grounded in cognitive science. Existing LLM reasoning methods address only conventional problem-solving, leaving creative paradigms unaddressed and potentially explaining diversity collapse in ideation.

Why do preference-tuned models produce different diversity patterns in code versus creative writing?

Sources 7 notes

Next inquiring lines