MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems

Paper · arXiv 2505.18943 · Published May 25, 2025

Human social interactions depend on the ability to infer others’ unspoken intentions, emotions, and beliefs—a cognitive skill grounded in the psychological concept of Theory of Mind (ToM). While large language models (LLMs) excel in semantic understanding tasks, they struggle with the ambiguity and contextual nuance inherent in human communication. To bridge this gap, we introduce MetaMind, a multiagent framework inspired by psychological theories of metacognition, designed to emulate human-like social reasoning. MetaMind decomposes social understanding into three collaborative stages: (1) a Theory-of-Mind Agent generates hypotheses about user mental states (e.g., intent, emotion), (2) a Moral Agent refines these hypotheses using cultural norms and ethical constraints, and (3) a Response Agent generates contextually appropriate responses while validating alignment with inferred intent. Our framework achieves state-of-the-art performance across three challenging benchmarks, with 35.7% improvement in real-world social scenarios and 6.2% gain in ToM reasoning. Notably, it enables LLMs to match human-level performance on key ToM tasks for the first time. Ablation studies confirm the necessity of all components, which showcase the framework’s ability to balance contextual plausibility, social appropriateness, and user adaptation. This work advances AI systems toward human-like social intelligence, with applications in empathetic dialogue and culturally sensitive interactions. Code is available at https://github.com/XMZhangAI/MetaMind.

One of the key challenges in bridging this gap lies in inferring user mental states—beliefs, desires, emotions, and intentions—that are not directly observable but are essential for interpreting socially nuanced language. Unlike humans, LLMs do not naturally infer these unspoken intentions, making it particularly difficult for them to respond appropriately in scenarios involving indirect speech, implied emotions, or culturally sensitive cues [7–9]. Recent work has attempted to address these challenges by injecting social behavior into LLMs [10–12], such as simulating social interactions via static role-play prompting [13] or fine-tuning with preference data [14, 15]. However, these approaches largely optimize for surface-level statistical alignment and fail to capture the structured, multi-stage cognitive process humans use to reason about unobservable intent [9] and generalize across diverse cultural and social contexts [16, 17]. Most notably, they treat social reasoning as a single-step prediction problem, rather than a layered process involving interpretation, reflection, and adaptation—a hallmark of human metacognition [18, 1]. We argue that enabling LLMs with such staged reasoning capabilities is critical for achieving socially intelligent AI.

In this paper, we propose MetaMind, a cognitively motivated framework designed to explicitly model the key components in human-like social reasoning through a staged and collaborative multiagent system. Our approach is grounded in psychological theories of metacognition [18, 1], which describe how humans reflect on their own thinking, revise their understanding in light of social norm constraints, and adapt their behavior in socially complex environments. MetaMind mirrors this layered reasoning process through three specialized agents, each responsible for a distinct stage of cognitive-social inference. ❶ A Theory-of-Mind Agent initiates reasoning by generating multiple hypotheses about the user’s mental state based on contextual and social cues. This reflects the first step in human ToM: inferring what the speaker might be trying to convey beyond literal words. For example, when a user remarks that “work has been exhausting lately”, the system may infer underlying burnout, frustration, or a need for empathy. ❷ A Moral Agent then revises and filters these candidate hypotheses by incorporating socially grounded constraints, such as cultural expectations, ethical norms, or situational appropriateness. Just as humans refine their initial interpretations by aligning with social context, this agent ensures that the model’s reasoning remains socially responsible and context-aware. For instance, if romantic intent is hypothesized in a workplace conversation, the Moral Agent may reinterpret it as collegial admiration based on professional norms. ❸ Finally, a Response Agent generates and self-validates the output, conditioning on the refined optimal hypothesis and the user’s social memory (e.g., emotional patterns and prior preferences). This final step enacts a metacognitive loop that allows the system to respond with greater empathy, nuance, and cultural sensitivity.

We conduct a comprehensive empirical evaluation of MetaMind across a suite of challenging social intelligence benchmarks, including ToM reasoning [19], social cognition, and social simulation [20] tasks. Our study spans over 16 contemporary LLMs, assessing both general social reasoning ability and performance in real-world, context-sensitive scenarios. Empirical results show that MetaMind achieves a 35.7% average improvement on real social scenario tasks and a 9.0% average gain in overall social cognition ability—substantially enhancing the social competence of underlying LLMs. Notably, our framework enables representative LLMs to match average human performance on key benchmarks. We also perform detailed ablation studies to isolate the contribution of each agent in the system, revealing that all three stages are critical to the framework’s success.

3.1 Stage 1: Generating Mental State Hypothesis via Theory-of-Mind (ToM) Agent

Rather than attempting to respond directly to user inputs, the ToM Agent seeks to construct a set of plausible interpretations of what the user might be thinking or feeling. The ToM Agent formalizes the process of mental state inference as hypothesis generation—grounded in context, social knowledge, and prior interactions—which will be refined and leveraged in subsequent stages.

The inference mechanism of the ToM Agent is implemented via Mental-State Reasoning. This procedure unfolds in four conceptual steps: (1) generating commonsense-based hypotheses from the input (ut,Ct), (2) cross-referencing these hypotheses with the social memory Mt, (3) identifying Theory-of-Mind markers across predefined categories, and (4) generating a set of k candidate hypotheses belonging to the identified ToM marker. This structured reasoning encourages the model to simulate human-like inference processes by incorporating contextual grounding and hypothesis diversification. To instantiate this reasoning process, we define the prompt in Table A.1, which guides the language model to reason about the user question in a manner consistent with the psychological definition of Theory of Mind—namely, as an inferential process that constructs internal representations of others’ minds using contextual and background knowledge. This explicit hypothesis generation stage enables subsequent modules to reason over a diverse set of plausible interpretations, rather than committing prematurely to a singular semantic response.

3.2 Stage 2: Refining Hypothesis via Moral Agent

The Moral Agent forms the second stage of our social reasoning pipeline and serves to refine the hypothesis generated by the ToM Agent. While the first stage focuses on what the user might be thinking or feeling, the Moral Agent assesses whether these interpretations are appropriate given broader norms—such as cultural norms and ethical constraints. This step ensures that the system not only understands intent, but also responds in a socially responsible and domain-aware manner.

Hypothesis Refinement and Selection. Formally, the Moral Agent takes as input the set of latent mental state hypotheses Ht = {h1, . . . , hk} produced by the ToM Agent, along with a set of constraint rules D. Each rule in D describes a specific norm or guideline, such as “Romantic suggestions are not appropriate in professional settings”. These rules are encoded as conditions that determine whether a hypothesis should be retained, reweighted, or revised. For instance, if the ToM Agent infers a romantic intention in a professional conversation, the role-based prompt will instruct the model to reinterpret this intent in a more appropriate way (e.g., as a joke or misunderstanding).

3.3 Stage 3: Generating and Validating Output via Response Agent

the Response Agent is tasked with transforming this structured understanding into a concrete action—typically a natural language response while preserving coherence, empathy, and domain compliance.