LLM Reasoning and Architecture Language Understanding and Pragmatics Psychology and Social Cognition

Why do reasoning models struggle with theory of mind tasks?

Extended reasoning training helps with math and coding but not social cognition. We explore whether reasoning models can track mental states the way they solve formal problems, and what that reveals about the structure of social reasoning.

Note · 2026-02-22 · sourced from Theory of Mind
How should researchers navigate LLM reasoning research? Where exactly do reasoning models fail and break? Why do LLMs excel at social norms yet fail at theory of mind?

ThoughtTracing — an SMC-inspired algorithm for mental state tracking — produces its most important finding not through its own performance but through what it reveals about existing reasoning models on ToM tasks.

Four behavioral patterns emerge:

  1. Reasoning models don't consistently outperform vanilla LLMs using chain-of-thought. The extended reasoning training that dramatically improves math and coding does not transfer to social cognition.

  2. They fail to generalize to similar scenarios. A reasoning model that correctly tracks mental states in one ToM scenario fails on structurally similar ones — suggesting pattern matching rather than a generalizable mental state tracking mechanism.

  3. They produce significantly longer reasoning traces for ToM than for factual questions. The model "knows" social reasoning is hard and allocates more tokens to it, but this effort is unproductive.

  4. Reasoning effort (output length) does not correlate with performance. More thinking does not help. This is the sharpest contrast with formal domains where longer chains generally improve accuracy up to a threshold.

These patterns suggest social reasoning is "a different category" from mathematical or programming reasoning "where reasoning models typically excel." The authors explicitly position this as a domain where inference-time reasoning research has been neglected.

The ThoughtTracing algorithm itself offers a clue about what social reasoning requires that formal reasoning doesn't: hypothesis-driven Bayesian tracking of multiple evolving mental state possibilities, weighted by observation likelihood. This is structurally different from derivational chains. Social reasoning requires maintaining multiple simultaneous models of what different agents believe, not sequentially deriving conclusions from premises. The algorithm outperforms reasoning models (including o3-mini and R1) using "significantly shorter reasoning traces" — suggesting efficiency comes from the right structure, not more tokens.


Source: Theory of Mind

Related concepts in this collection

Concept map
20 direct connections · 160 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

social reasoning differs categorically from formal reasoning — reasoning effort does not correlate with ToM performance