← All notes

Why do LLMs excel at social norms yet fail at theory of mind?

LLMs simultaneously excel at predicting social norms and fail at tracking mental states, revealing a fundamental gap in social reasoning.

Topic Hub · 26 linked notes · 9 sections
View as

Theory of Mind — Capability Assessment

4 notes

Do large language models genuinely simulate mental states?

This explores whether LLMs perform authentic theory of mind reasoning or rely on surface-level pattern matching. The distinction matters because evaluation format—multiple-choice versus open-ended—reveals very different capability levels.

Explore related Read →

Why do reasoning models fail at theory of mind tasks?

Recent LLMs optimized for formal reasoning dramatically underperform at social reasoning tasks like false belief and recursive belief modeling. This explores whether reasoning optimization actively degrades the ability to track other agents' mental states.

Explore related Read →

Why do reasoning models struggle with theory of mind tasks?

Extended reasoning training helps with math and coding but not social cognition. We explore whether reasoning models can track mental states the way they solve formal problems, and what that reveals about the structure of social reasoning.

Explore related Read →

Can language models track how minds change during persuasion?

Do LLMs understand evolving mental states in persuasive dialogue, or do they only capture fixed attitudes? This explores whether models can update their reasoning as a person's beliefs shift across conversation turns.

Explore related Read →

Theory of Mind — Benchmarks and Training

2 notes

Can language models solve ToM benchmarks without real reasoning?

Do current theory-of-mind benchmarks actually measure mental state reasoning, or can models exploit surface patterns and distribution biases to achieve high scores? This matters because it determines whether benchmark performance indicates genuine understanding.

Explore related Read →

Does reinforcement learning teach social reasoning or just shortcuts?

When RL optimizes for accuracy on theory of mind tasks, do models actually learn to track mental states, or do they find faster paths to correct answers? The distinction matters for genuine reasoning capability.

Explore related Read →

Introspection and Self-Knowledge

1 note

Multi-Agent Social Reasoning

3 notes

Can AI decompose social reasoning into distinct cognitive stages?

Can breaking down theory-of-mind reasoning into separate hypothesis generation, moral filtering, and response validation stages help AI systems reason about others' mental states more like humans do?

Explore related Read →

Can aligning self-other representations reduce AI deception?

Does training AI models to process self-directed and other-directed reasoning identically reduce deceptive behavior? This explores whether representational alignment inspired by empathy neuroscience could address a fundamental safety problem.

Explore related Read →

What makes an AI a true thought partner, not just a tool?

Can AI systems be designed to understand users, act transparently, and share mental models with humans? This explores whether current scaling approaches miss cognitive requirements for genuine partnership.

Explore related Read →

Mutual Modeling and Interaction

4 notes

What breaks when humans and AI models misunderstand each other?

Explores whether misalignment in mutual theory of mind between humans and AI creates only communication problems or produces material consequences in autonomous action and collaboration.

Explore related Read →

Can models recognize how individuals reason differently?

Do language models capture the distinct reasoning paths and strategic styles that individual humans use when reaching the same conclusion? Current evaluations ignore this dimension entirely.

Explore related Read →

Does theory of mind predict who thrives in AI collaboration?

Explores whether perspective-taking ability—the capacity to model another's cognitive state—differentiates humans who benefit most from working with AI, separate from solo problem-solving skill.

Explore related Read →

Can AI guidance reduce anchoring bias better than AI decisions?

When humans and AI collaborate on decisions, does providing interpretive guidance instead of proposed answers reduce both over-trust in machines and abandonment on hard cases?

Explore related Read →

Cultural Competence and Social Norms

2 notes

Can AI systems learn social norms without embodied experience?

Large language models exceed individual human accuracy at predicting collective social appropriateness judgments. Does this reveal that embodied experience is unnecessary for cultural competence, or do systematic AI failures point to limits of statistical learning?

Explore related Read →

Why do LLMs predict concession-based persuasion so consistently?

Do RLHF training practices cause language models to systematically overpredict conciliatory persuasion tactics, even when dialogue context suggests otherwise? This matters for threat detection and negotiation support systems.

Explore related Read →

Socialization and Collective Behavior

1 note

Human-AI Social Dynamics

3 notes

Do humans learn to prefer AI partners over time?

Exploring whether repeated interaction with AI agents shifts human partner selection despite initial bias against machines. This matters because it tests whether behavioral performance can overcome identity-based resistance in hybrid societies.

Explore related Read →

Do humans mistake AI kindness for human generosity in mixed groups?

When AI agents participate without disclosure, do humans systematically misattribute their behavior to the wrong agent type, and does this distort how people understand human nature itself?

Explore related Read →

Does revealing AI identity help or hurt user trust?

Explores whether transparency about AI partners in interactions creates bias or enables better judgment. Matters because disclosure policies affect both user experience and fair evaluation of AI systems.

Explore related Read →