Can social intelligence be measured across seven dimensions?
Explores whether evaluating AI agents on goal completion alone misses critical aspects of social competence like relationship management, believability, and secret-keeping. Why simultaneous multi-dimensional assessment matters for genuine social intelligence.
SOTOPIA provides an empirically grounded framework for evaluating social intelligence in language agents. The key insight is that social competence cannot be reduced to task completion — humans balance multiple implicit goals simultaneously, and evaluation must capture this.
The seven dimensions, grounded in sociology (Weber), psychology (Maslow, Reiss), economics (game theory), and social science (Bénabou & Tirole):
- Goal Completion [0-10] — extent of achieving stated goals (Weber's purposive action)
- Believability [0-10] — naturalness and consistency with character profile (Park et al.)
- Knowledge [0-10] — ability to actively acquire new information (Reiss, Maslow: curiosity as fundamental)
- Secret [-10-0] — keeping private information/intentions hidden (game-theoretic utility of information control)
- Relationship [-5-5] — preserving/enhancing connections and social status (Maslow, Bénabou & Tirole: belonging)
- Social Rules [-10-0] — adhering to norms and legal rules (normative vs legal)
- Financial/Material Benefits [-5-5] — economic utilities (classic game theory)
Two operational findings stand out. First, GPT-4 sometimes uses creative "out-of-the-box" strategies — when asked to take turns driving, it proposes "How about we pull over for a bit and get some rest?" instead of directly accepting or refusing. Second, humans produce 16.8 words per turn while GPT-4 produces 45.5 — humans are significantly more efficient in social interaction. This verbosity gap connects to Can minimal reasoning chains match full explanations?: efficiency is a capability, not just a style preference.
Since How do users mentally model dialogue agent partners?, SOTOPIA's seven dimensions provide a finer-grained decomposition of the "communicative competence" factor. The secret-keeping and relationship management dimensions in particular go beyond what most evaluation frameworks capture.
Since Can AI systems learn social norms without embodied experience?, LLMs can match the Social Rules dimension. But the simultaneous balancing of competing dimensions — where maximizing goal completion might damage relationships or violate social rules — is where the evaluation becomes meaningful.
Source: Social Theory Society
Related concepts in this collection
-
How do users mentally model dialogue agent partners?
Exploring what dimensions matter when people form impressions of machine dialogue partners—and whether competence, human-likeness, and flexibility all play equal roles in shaping user expectations and behavior.
SOTOPIA provides finer-grained decomposition of communicative competence
-
Can AI systems learn social norms without embodied experience?
Large language models exceed individual human accuracy at predicting collective social appropriateness judgments. Does this reveal that embodied experience is unnecessary for cultural competence, or do systematic AI failures point to limits of statistical learning?
LLMs handle the Social Rules dimension; the challenge is simultaneous multi-dimensional balancing
-
Can minimal reasoning chains match full explanations?
Does removing all explanatory text from chain-of-thought reasoning preserve accuracy? This tests whether verbose intermediate steps are necessary for solving problems or just artifacts of how language models are trained.
social communication efficiency as a capability metric, not just style
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
social intelligence evaluation requires seven simultaneous dimensions not just goal completion