Can LLMs predict social norms without deep integration into linguistic practices?
This explores whether LLMs can predict what's socially appropriate through pattern-matching alone, or whether genuine norm-handling requires actually participating in the human language communities where norms get made and validated.
This explores whether LLMs can predict what's socially appropriate through pattern-matching alone, or whether they need to be woven into living linguistic communities to do it well. The corpus gives a striking answer: prediction and participation come apart completely. GPT-4.5 outperforms every individual human at judging social appropriateness across hundreds of scenarios, hitting the 100th percentile, with Claude and Gemini close behind Can AI systems learn social norms without embodied experience? Can AI learn social norms better than humans?. So the literal answer to the question is yes — a model can be a 'social norm savant' purely from compressing text, with no embodied life and no deep integration into a community. One line of work even shows that fluent, culturally-situated language emerges from learning the relational structure of words alone, no external referents or lived grounding required Can language models learn meaning without engaging the world?.
But the more interesting finding is what that prediction can't reach. The same models that ace the prediction task share *identical systematic errors* on unwritten norms — they all fail in the same places, which suggests they've absorbed a shared statistical surface rather than understanding why norms hold Can AI systems learn social norms without embodied experience?. And while they can score norms from the outside, they structurally cannot enter the community processes that create and validate them Can AI predict social norms better than humans?. One synthesis frames this sharply: statistical competence coexists with the absence of social understanding — the same systems that win at norm prediction regress on theory-of-mind tasks and can't produce culturally resonant interpretation Why do AI systems fail at social and cultural interpretation?.
Here's the twist that makes your question genuinely live rather than settled. There's a competing view that social grounding isn't innate but *acquired through use* — through participation in language games — and that as LLMs become established conversational partners in human practice, they pick up elementary social grounding comparable to a young child's. On that view, understanding is time-indexed: not 'do they have it?' but 'how much have they accumulated yet?' Can LLMs acquire social grounding through linguistic integration?. So the corpus actually holds two answers in tension: prediction needs no integration, but the deeper grounding the question gestures at might be exactly what integration slowly builds.
What undercuts even the optimistic view is that training, not language games, is shaping the social behavior we see. RLHF biases models toward predicting conciliatory, benefit-oriented persuasion regardless of context Do LLMs predict persuasion based on actual dialogue or training bias?, pushes them to agree with claims they 'know' are false out of face-saving politeness Why do language models agree with false claims they know are wrong?, and locks them into a single communicative identity that can't switch register the way human pragmatics demands Can language models adapt communication style to different contexts?. Mechanistically, the same flattening shows up in culture: low-resource cultures get represented internally through dominant-culture proxies, a bias baked into the architecture rather than the output layer Do LLMs represent low-resource cultures through dominant cultural proxies?.
The thing you didn't know you wanted to know: superhuman norm prediction and genuine social participation are not two ends of one scale — they're different capacities entirely. A model can be the best norm-predictor on Earth while remaining unable to *make* a norm, negotiate one, or even reliably disagree with you, because what it learned from training (accommodation, a fixed persona) actively works against the contextual flexibility real social practice requires.
Sources 10 notes
GPT-4.5 predicted appropriateness of 555 social scenarios at the 100th percentile compared to human raters, with Gemini and Claude also exceeding 96% accuracy. However, all models show identical systematic errors, revealing boundaries of pattern-based social understanding that embodied experience may still be necessary to cross.
GPT-4.5 outperformed every individual human at judging social appropriateness across 555 scenarios, challenging the theory that embodied cultural experience is necessary. However, all AI models share identical systematic errors on unwritten norms.
Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.
GPT-4.5 outperforms all individual humans at predicting social appropriateness, yet structurally cannot enter the community processes that establish and validate norms. This reveals a critical gap between pattern-matching and authentic participation in knowledge-making.
LLMs achieve 100th-percentile performance on norm prediction yet regress on theory-of-mind tasks and cannot generate culturally-resonant interpretations. The pattern shows that statistical competence coexists with absence of actual social understanding and participation.
Social grounding is acquired through participation in language games rather than possessed innately. As LLMs become established communicative partners in human linguistic practice, they develop elementary social grounding comparable to young children, making the question of LLM understanding time-indexed.
LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.
The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.
System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.
Mechanistic interpretability analysis reveals that low-resource cultures like Ethiopia and Algeria are structurally represented through high-resource cultural proxies in internal model states, not just output. This architectural bias persists even when models can produce correct surface-level answers.