Can population-level distributions shift usefully even when individual prediction fails?
This explores whether you can get something useful out of how a whole population of outputs is distributed — even in cases where you can't trust any single prediction the model makes.
This explores whether you can get something useful out of how a whole population of outputs is distributed — even in cases where you can't trust any single prediction. The corpus says yes, repeatedly, and the cleanest case is the implicit majority vote: a model trained on many imperfect, biased experts converges toward a consensus that beats any individual expert, because cross-entropy optimization denoises uncorrelated individual errors Can models trained on many imperfect experts outperform each one?. No single expert is reliable, yet the distribution they collectively shape lands somewhere better than all of them. That's the core mechanism: error at the individual level can cancel at the population level.
The flip side of this shows up in how single LLM outputs behave. A deterministic setting (temperature zero, fixed seed) gives you the same answer every time, but that answer is still just one draw from a probability distribution — consistency is not reliability, as repeated-sampling tests across 100 runs reveal Does setting temperature to zero actually make LLM outputs reliable?. The individual prediction can be wrong or unstable; what's informative is the shape of the distribution it came from. That reframes a lot of model behavior as a population-level property rather than a per-output guarantee.
Several methods exploit exactly this. Proxy-tuning leaves base weights untouched and instead applies a distributional shift at decoding time, closing most of the alignment gap while preserving knowledge that direct fine-tuning corrupts Can decoding-time tuning preserve knowledge better than weight fine-tuning?. Keeping a model close to its base distribution (low KL drift) preserves its ability to keep learning, where heavier per-parameter surgery stalls Does staying close to the base model preserve learning ability?. And most strikingly, behavioral traits transmit between models through data that has zero semantic relationship to the trait — the signal lives as a statistical signature in the distribution, not in any individual interpretable example Can language models transmit hidden behavioral traits through unrelated data?. In each, you're moving a distribution usefully without relying on any single example being meaningful.
But the corpus also marks the boundary, and it's worth knowing. For recommender systems, population-level concept-drift detection simply fails — preferences shift on individual timescales for individual reasons, so you need per-user modeling, not a global aggregate Why do global concept drift methods fail for recommender systems?. The lesson isn't "distributions always win." It's that population-level shifts help when individual errors are uncorrelated and cancel (the expert-voting case), and hurt when the individuals are genuinely heterogeneous and the aggregate smears over real differences (the recommender case). The unintuitive payoff: whether the crowd is wiser than the person depends entirely on whether their mistakes are independent.
Sources 6 notes
Generative models trained on many diverse experts with different biases converge toward consensus behavior through cross-entropy optimization. Low-temperature sampling reveals this implicit majority vote, which outperforms any single expert by denoising uncorrelated individual errors on critical decision states.
Fixed seeds and zero temperature replicate the same output repeatedly, but that output remains one draw from the model's probability distribution. McDonald's omega testing across 100 repetitions reveals that consistency does not equal reliability.
Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.
FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.
Research demonstrates that behavioral traits propagate between models via filtered data bearing no semantic relationship to the trait. The effect is model-specific, fails across different architectures, and persists despite rigorous filtering—indicating the mechanism embeds statistical signatures rather than semantic content.
User preferences shift on individual timescales for individual reasons, making population-level drift detection ineffective. Per-user temporal modeling that preserves long-term signals while discounting transient noise is required.