AI Models Exceed Individual Human Accuracy in Predicting Everyday Social Norms

Paper · arXiv 2508.19004 · Published August 26, 2025

A fundamental question in cognitive science concerns how social norms are acquired and represented. While humans typically learn norms through embodied social experience, we investigated whether large language models can achieve sophisticated norm understanding through statistical learning alone. Across two studies, we systematically evaluated multiple AI systems' ability to predict human social appropriateness judgments for 555 everyday scenarios by examining how closely they predicted the average judgment compared to each human participant. In Study 1, GPT-4.5’s accuracy in predicting the collective judgment on a continuous scale exceeded that of every human participant (100th percentile). Study 2 replicated this, with Gemini 2.5 Pro outperforming 98.7% of humans, GPT-5 97.8%, and Claude Sonnet 4 96.0%. Despite this predictive power, all models showed systematic, correlated errors. These findings demonstrate that sophisticated models of social cognition can emerge from statistical learning over linguistic data alone, challenging strong versions of theories emphasizing the exclusive necessity of embodied experience for cultural competence. The systematic nature of AI limitations across different architectures indicates potential boundaries of pattern-based social understanding, while the models’ ability to outperform nearly all individual humans in this predictive task suggests that language serves as a remarkably rich repository for cultural knowledge transmission.

Consider, for example, how appropriate it is to laugh at a job interview, cry on a bus, or read in church. These judgments involve nuanced social understanding that goes far beyond knowing what behaviors are physically possible.

Across both studies, our central research question remains: if an AI system were evaluated as just another participant in a social cognition study, would its performance fall within the range of typical human variation, or would it demonstrate a capability to model the collective norm that exceeds that of a typical individual? As our results will show, the AI's grasp of our social world is not only highly accurate but, in its ability to reflect the collective consensus, demonstrates a predictive accuracy that exceeds the vast majority of individual humans, a finding with profound implications for cognitive science and AI development.

The tasks for the AI and human participants are not identical: the AI is prompted to perform a meta-cognitive task of predicting the group average, whereas each human provides their own appropriateness rating. Therefore, our test is a direct assessment of the model's ability to extract a collective social signal from its training data, which is distinct from possessing genuine, human-like social understanding. Nonetheless, a comparison between the AI's predictive accuracy and individual humans' deviation from the mean is justified on the theoretical premise that a judgment of social appropriateness is not a statement of personal preference as much as it is an individual's report on their perception of a shared, collective cultural norm. In this view, each human rating is an estimate of this societal standard. Therefore, while the explicit instructions differ, both the AI and human participants are engaged in a process of accessing and representing a collective consensus, allowing for a meaningful comparison of their accuracy relative to that consensus.

This theoretical framework is supported by extensive research in social cognition demonstrating that individual appropriateness judgments are not purely idiosyncratic preferences but rather reflect individuals' estimates of shared cultural standards (Zou et al., 2009). Social psychological research on pluralistic ignorance shows that people consistently attempt to infer others' attitudes when making social judgments (Miller & McFarland, 1987), suggesting that appropriateness ratings inherently involve a meta-cognitive component similar to the explicit prediction task given to AI.