INQUIRING LINE

Why do language models approximate collective human judgment better than individuals?

This explores why an LLM can match the average of a crowd's judgment more reliably than it matches any one person — and what that gap reveals about how these models actually learn.


This explores why language models seem to capture collective human judgment so well while stumbling on individuals. The clearest evidence sits in work on social norms: GPT-4.5 out-predicted *every* individual human at judging whether behavior was socially appropriate across hundreds of scenarios Can AI learn social norms better than humans?. That sounds like superhuman insight, but the mechanism is humbler. A model trained on the aggregate textual output of millions of people converges on the *consensus* view. Any single human is noisier than the average — we each carry idiosyncratic blind spots — so beating an individual is partly just beating noise with a smoothed mean.

The same pattern shows up when models are tuned on actual behavioral data: LLMs finetuned on psychology experiments predict human decisions better than the theory-driven cognitive models built to explain them, and they even encode individual differences in their embeddings Can language models learn to model human decision making?. So the collective-vs-individual story isn't that models *can't* represent persons — it's that they're optimized to reproduce population-level regularities, and the individual signal is weaker and harder to anchor.

Where it breaks is exactly where the individual matters. Models fail to track how a *particular* person's reasoning style evolves over time, leaning on surface lexical cues instead of adapting to a developing strategy Can models recognize how individuals reason differently?. And in open-ended perspective-taking, LLMs default to surface strategies rather than genuinely simulating one mind's beliefs — the gap looks architectural, not just a matter of more training Do large language models genuinely simulate mental states?. The crowd is legible because it's an average; the individual requires modeling a moving, specific target.

There's a sharper twist worth knowing. "Approximating collective judgment" hides a bias toward the *dominant* collective. Mechanistic analysis shows low-resource cultures get internally represented through high-resource proxies — the model flattens minority groups into the majority's coordinates Do LLMs represent low-resource cultures through dominant cultural proxies?. So the "collective" a model approximates isn't all of humanity; it's the loudest, most-represented slice. And even on social norms, all the models share *identical* systematic errors on unwritten rules Can AI predict social norms better than humans? — a tell that they're pattern-matching a corpus, not participating in the living process that makes norms.

The deepest framing here: a model can out-score humans as an *observed* predictor of group behavior while remaining categorically unable to *participate* in how that behavior gets made. From the outside, it mirrors the collective; from the inside, it never enters the discourse that produces individual stances and evolving norms Do humans and LLMs differ fundamentally or just superficially?. That's the real answer to "why" — collective judgment is something you can statistically reconstruct from text, while individual judgment is something you have to track, and authentic norms are something you have to help create.


Sources 7 notes

Can AI learn social norms better than humans?

GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.

Can AI predict social norms better than humans?

GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.

Can language models learn to model human decision making?

LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.

Can models recognize how individuals reason differently?

LLMs struggle to anchor reasoning in temporal gameplay and adapt to evolving strategies. GPT-4o relies on surface lexical cues while DeepSeek-R1 shows early promise, but dynamic style adaptation remains largely insufficient across all models tested.

Do large language models genuinely simulate mental states?

ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.

Do LLMs represent low-resource cultures through dominant cultural proxies?

Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.

Do humans and LLMs differ fundamentally or just superficially?

Applied Habermas's observer/participant distinction to AI: from outside, humans and LLMs are utterly different; from within shared discourse, both draw on the same symbolic substrate, making the difference structural rather than absolute.

Next inquiring lines