Do LLMs predict social norms more accurately than individual behavior?

This explores a contrast the corpus draws sharply: LLMs predicting what a group considers appropriate (social norms) versus predicting what a specific person will do or think (individual behavior and mental states).

This explores a contrast the corpus draws sharply: LLMs predicting what a group considers appropriate (social norms) versus predicting what a specific person will do or think. On the norm side, the evidence is striking — GPT-4.5 judged the social appropriateness of 555 scenarios more accurately than *every individual human rater* it was compared against, scoring at the 100th percentile, with Gemini and Claude also clearing 96% Can AI systems learn social norms without embodied experience? Can AI learn social norms better than humans?. That's the surprising part: this happened *without* embodied cultural experience, which had been assumed necessary to know unwritten rules.

But "more accurate than individual behavior" turns out to be the wrong framing — the corpus suggests the two tasks aren't on the same axis. Norm prediction is essentially aggregate statistics: what does the average person consider appropriate? Individual behavior is something else, and here LLMs stumble. The same models that top the charts on norms *regress* on theory-of-mind tasks like Decrypto, and reasoning effort doesn't rescue them Why do LLMs excel at social norms yet fail at theory of mind? Why do AI systems fail at social and cultural interpretation?. They master social statistics while missing actual mental-state reasoning about a specific other person.

There's a second crack: even the norm-prediction triumph hides a uniformity problem. All the top models share *identical systematic errors* on the hardest unwritten norms Can AI learn social norms better than humans? Can AI systems learn social norms without embodied experience?. So it's not that they understand norms — they pattern-match the consensus very well, then fail in the same blind spot together, which is what you'd expect from statistics rather than participation. A related note argues this is structural: AI can predict appropriateness with superhuman accuracy yet cannot enter the community processes that *create and validate* norms in the first place Can AI predict social norms better than humans?.

Interestingly, the picture flips when you point LLMs at individual decision-making with the right training. Models fine-tuned on psychology-experiment data outpredict purpose-built cognitive models of human choice, and even capture individual differences in their embeddings Can language models learn to model human decision making?. And persona-based simulations replicate about 76% of published experimental main effects — but reliably only when the original effect was strong, going wobbly on marginal ones Can AI personas reliably replicate human experiment results?. So individual behavior isn't beyond reach; it just needs targeting and degrades at the edges.

The deeper thread worth carrying away: where LLMs look socially competent, they're often borrowing structure they didn't earn. Simulations succeed when one model secretly controls everyone and collapse the moment agents hold private information from each other — the apparent competence relied on grounding work the model skipped Why do LLMs fail when simulating agents with private information?. So the honest answer is: yes, LLMs predict collective norms better than individual humans can, but that's a statistical feat, not social understanding — and it tells you less about predicting any one person than the headline number implies.

Sources 8 notes

Can AI systems learn social norms without embodied experience?

GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.

Can AI learn social norms better than humans?

GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.

Can AI predict social norms better than humans?

GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.

Why do LLMs excel at social norms yet fail at theory of mind?

GPT-4.5 reaches the 100th percentile on social norm prediction, yet o1 and Claude 3.7 regress on theory of mind tasks like Decrypto. Open-ended scenarios expose surface-level strategies hidden by structured questions, and reasoning effort does not improve social reasoning performance.

Why do AI systems fail at social and cultural interpretation?

LLMs achieve 100th-percentile performance on norm prediction yet regress on theory-of-mind tasks and cannot generate culturally-resonant interpretations. The pattern shows that statistical competence coexists with absence of actual social understanding and participation.

Can language models learn to model human decision making?

LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.

Can AI personas reliably replicate human experiment results?

Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.

Why do LLMs fail when simulating agents with private information?

Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.

Do LLMs predict social norms more accurately than individual behavior?

Sources 8 notes

Next inquiring lines