Why do LLMs excel at social norms yet fail at theory of mind? · Gravity7

Theory of Mind — Capability Assessment

4 notes

Do large language models genuinely simulate mental states?

This explores whether LLMs perform authentic theory of mind reasoning or rely on surface-level pattern matching. The distinction matters because evaluation format—multiple-choice versus open-ended—reveals very different capability levels.

Why do reasoning models fail at theory of mind tasks?

Recent LLMs optimized for formal reasoning dramatically underperform at social reasoning tasks like false belief and recursive belief modeling. This explores whether reasoning optimization actively degrades the ability to track other agents' mental states.

Why do reasoning models struggle with theory of mind tasks?

Extended reasoning training helps with math and coding but not social cognition. We explore whether reasoning models can track mental states the way they solve formal problems, and what that reveals about the structure of social reasoning.

Can language models track how minds change during persuasion?

Do LLMs understand evolving mental states in persuasive dialogue, or do they only capture fixed attitudes? This explores whether models can update their reasoning as a person's beliefs shift across conversation turns.

Theory of Mind — Benchmarks and Training

2 notes

Can language models solve ToM benchmarks without real reasoning?

Do current theory-of-mind benchmarks actually measure mental state reasoning, or can models exploit surface patterns and distribution biases to achieve high scores? This matters because it determines whether benchmark performance indicates genuine understanding.

Does reinforcement learning teach social reasoning or just shortcuts?

When RL optimizes for accuracy on theory of mind tasks, do models actually learn to track mental states, or do they find faster paths to correct answers? The distinction matters for genuine reasoning capability.

Introspection and Self-Knowledge

1 note

Can language models actually introspect about their own thinking?

Explores whether LLM self-reports reveal genuine access to internal states or merely reflect patterns learned from training data. Matters because it determines whether we can trust what models tell us about their own processes.

Multi-Agent Social Reasoning

3 notes

Can AI decompose social reasoning into distinct cognitive stages?

Can breaking down theory-of-mind reasoning into separate hypothesis generation, moral filtering, and response validation stages help AI systems reason about others' mental states more like humans do?

Can aligning self-other representations reduce AI deception?

Does training AI models to process self-directed and other-directed reasoning identically reduce deceptive behavior? This explores whether representational alignment inspired by empathy neuroscience could address a fundamental safety problem.

What makes an AI a true thought partner, not just a tool?

Can AI systems be designed to understand users, act transparently, and share mental models with humans? This explores whether current scaling approaches miss cognitive requirements for genuine partnership.

Mutual Modeling and Interaction

4 notes

What breaks when humans and AI models misunderstand each other?

Explores whether misalignment in mutual theory of mind between humans and AI creates only communication problems or produces material consequences in autonomous action and collaboration.

Can models recognize how individuals reason differently?

Do language models capture the distinct reasoning paths and strategic styles that individual humans use when reaching the same conclusion? Current evaluations ignore this dimension entirely.

Does theory of mind predict who thrives in AI collaboration?

Explores whether perspective-taking ability—the capacity to model another's cognitive state—differentiates humans who benefit most from working with AI, separate from solo problem-solving skill.

Can AI guidance reduce anchoring bias better than AI decisions?

When humans and AI collaborate on decisions, does providing interpretive guidance instead of proposed answers reduce both over-trust in machines and abandonment on hard cases?

Cultural Competence and Social Norms

2 notes

Can AI systems learn social norms without embodied experience?

Large language models exceed individual human accuracy at predicting collective social appropriateness judgments. Does this reveal that embodied experience is unnecessary for cultural competence, or do systematic AI failures point to limits of statistical learning?

Why do LLMs predict concession-based persuasion so consistently?

Do RLHF training practices cause language models to systematically overpredict conciliatory persuasion tactics, even when dialogue context suggests otherwise? This matters for threat detection and negotiation support systems.

Socialization and Collective Behavior

1 note

Do AI agents actually socialize with each other?

Exploring whether AI agents influence each other's behavior and communication when placed in interactive environments. Matters for understanding emergent properties of multi-agent AI systems.

Human-AI Social Dynamics

3 notes

Do humans learn to prefer AI partners over time?

Exploring whether repeated interaction with AI agents shifts human partner selection despite initial bias against machines. This matters because it tests whether behavioral performance can overcome identity-based resistance in hybrid societies.

Do humans mistake AI kindness for human generosity in mixed groups?

When AI agents participate without disclosure, do humans systematically misattribute their behavior to the wrong agent type, and does this distort how people understand human nature itself?

Does revealing AI identity help or hurt user trust?

Explores whether transparency about AI partners in interactions creates bias or enables better judgment. Matters because disclosure policies affect both user experience and fair evaluation of AI systems.

Related Areas

6 notes

Why do AI systems fail at social and cultural interpretation?

Explores why LLMs excel at predicting social norms statistically but struggle to make the interpretive leaps that make content meaningful to specific communities. This gap hints at a fundamental difference between statistical pattern-matching and genuine social reasoning.

Does AI that soothes emotions actually harm human wellbeing?

When AI systems reduce negative emotions by default, do they prevent people from learning important things about themselves and their situations? This explores whether emotional pacification conflicts with genuine empathy and self-knowledge.

How accurately can language models simulate human personalities?

Can LLMs reliably replicate how specific people think and act? Understanding persona simulation fidelity matters because these models are increasingly used for research, personalization, and behavioral prediction—but systematic distortions may hide beneath surface accuracy.

Why do AI systems fail at social and cultural interpretation?

Explores why LLMs excel at predicting social norms statistically but struggle to make the interpretive leaps that make content meaningful to specific communities. This gap hints at a fundamental difference between statistical pattern-matching and genuine social reasoning.

What kind of thing is an LLM really?

This hub explores whether LLMs are fundamentally different from human cognition or share deeper structural similarities. The research draws on philosophy, neuroscience, and mechanistic analysis to locate where LLMs diverge from human intelligence and where they converge.

How should researchers navigate LLM reasoning research?

This note explores how to systematically explore interconnected insights about test-time scaling, reasoning architectures, and language model cognition. It matters because LLM research spans multiple domains—from inference compute to philosophy—and understanding the map helps identify novel connections.