Why does diversity in LLM outputs mask sampling from community priors?
This explores a tension the corpus keeps circling: LLM outputs *look* varied — different wordings, different runs, different models — but that surface variety often masks the fact that the model is just sampling around a single shared distribution baked in by training and alignment (the 'community prior'), rather than producing genuinely independent or representative variety.
This explores how apparent diversity in LLM outputs can hide the fact that everything is being drawn from one shared, training-shaped distribution rather than from real independent variation. The cleanest demonstration is the 'Artificial Hivemind' effect: across 70+ models and 26K open-ended prompts, models independently generate strikingly similar — sometimes identical — responses, because they share overlapping training data and near-identical alignment procedures Do different AI models actually produce diverse outputs?. So even ensembling different models, which feels like it should buy you diversity, mostly re-samples the same consensus. The variety is real at the token level and illusory at the distribution level.
The persona work shows the same illusion from the opposite direction. When you run one persona prompt repeatedly, the output varies a lot — but that variance across runs matches or exceeds the variance across genuinely different personas Why do LLM persona prompts produce inconsistent outputs across runs?. In other words, the spread you see isn't the model channeling distinct viewpoints; it's the model's own uncertainty sloshing around a single prior. The diversity is noise wearing the costume of representation. That's exactly the masking the question names: more variety on screen, not more underlying coverage.
Why is the underlying prior so narrow in the first place? Training actively compresses it. RL post-training amplifies one dominant pretraining format within the first epoch and suppresses the alternatives — and which format wins depends on model scale, not quality Does RL training collapse format diversity in pretrained models?. Outcome-based RL goes further: it concentrates probability mass on correct trajectories and bleeds that diversity loss even onto problems the model hasn't solved Does outcome-based RL diversity loss spread across unsolved problems?. The 'community prior' is partly an artifact of alignment all pushing in the same direction. (Notably this isn't uniform — preference tuning *reduces* lexical diversity in code while *increasing* it in creative writing, so the squeeze depends on what each domain rewards Does preference tuning always reduce diversity the same way?.)
The most consequential version of the masking is cultural. Mechanistic analysis finds that low-resource cultures like Ethiopia and Algeria are internally represented *through* high-resource cultural proxies — a one-way flattening that persists even when the model can produce a correct surface answer Do LLMs represent low-resource cultures through dominant cultural proxies?. So a model can hand you a plausible, locally-flavored response while, underneath, it's sampling from a dominant prior and routing the 'other' culture through it. The right-sounding output is precisely what hides the missing representation. A parallel failure shows up in social simulation: models look socially competent when one model secretly controls all parties, but collapse once agents hold genuinely private information — the apparent competence was riding on grounding work the omniscient setup let it skip Why do LLMs fail when simulating agents with private information?.
The quietly useful payoff: diversity is not a free signal you can read off the outputs. Distinguishing genuine variety from prior-sampling takes work the surface won't show you — which is why approaches that *measure* semantic diversity directly and reward it during training (rather than trusting raw output spread) end up improving both diversity and quality at once Can diversity optimization improve quality during language model training?. If you want diverse outputs that actually represent something, you have to optimize for it explicitly; left alone, the model will give you the comforting appearance of variety drawn from a single well.
Sources 8 notes
INFINITY-CHAT analyzed 70+ models across 26K open-ended queries and found an "Artificial Hivemind" effect: models independently generate strikingly similar or identical responses due to overlapping training data and alignment procedures, undermining the diversity benefits of model ensembles.
When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.
Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.
RL that rewards only final answer correctness sharpens the policy globally, concentrating probability mass on correct trajectories for solved problems while simultaneously reducing diversity on unsolved ones. Historical exploration (training diversity via UCB-style bonuses) and batch exploration (test-time diversity via repetition penalties) require structurally different mechanisms.
RLHF reduces lexical-syntactic diversity in code generation but increases it in creative writing. The direction depends on what each domain incentivizes: code rewards convergence toward correct solutions, while creative writing rewards stylistic distinctiveness.
Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.
Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.
DARLING jointly optimizes for quality and semantic diversity using a learned classifier, finding that diversity rewards catalyze exploration and produce higher-quality outputs than quality-only baselines across both creative and mathematical tasks.