Simulating Society Requires Simulating Thought

Paper · arXiv 2506.06958 · Published June 8, 2025
World ModelsPhilosophy SubjectivitySocial Theory SocietyDesign FrameworksAlignment

Simulating society with large language models (LLMs), we argue, requires more than generating plausible behavior; it demands cognitively grounded reasoning that is structured, revisable, and traceable. LLM-based agents are increasingly used to simulate individual and group behavior, primarily through prompting and supervised fine-tuning. Yet current simulations remain grounded in a behaviorist “demographics in, behavior out” paradigm, focusing on surface level plausibility. As a result, they often lack internal coherence, causal reasoning, and belief traceability—making them unreliable for modeling how people reason, deliberate, and respond to interventions.

To address this, we present a conceptual modeling paradigm, Generative Minds (GenMinds), which draws from cognitive science to support structured belief representations in generative agents. To evaluate such agents, we introduce the RECAP (REconstructing CAusal Paths) framework, a benchmark designed to assess reasoning fidelity via causal traceability, demographic grounding, and intervention consistency. These contributions advance a broader shift: from surface-level mimicry to generative agents that simulate thought—not just language—for social simulations.

An Oversimplified Paradigm: Demographics In, Behavior Out. Despite the growing use of LLMs in social simulation, most current models rely on simplified input-output mappings, producing behavior based on surface cues rather than simulating the internal belief dynamics behind decisions. This approach mirrors the logic of behaviorism in psychology, which models behavior as a function of external stimuli while ignoring internal cognitive states. The limitations of this paradigm echo a broader historical tension between behaviorism, cognitivism, and constructivism[8, 9]: while cognitivism emphasized structured internal representations and causal reasoning, and constructivism further argued that beliefs are continually shaped by individual and social experience, existing LLM based agents remain far from either. They typically exhibit shallow reasoning, frequent hallucinations, and limited understanding of causal and contextual dynamics in socially-salient domains such as upzoning, surveillance, or healthcare access, precisely the domains where reasoning fidelity matters most [10, 11].

Behaviorism −→ Cognitivism −→ Constructivism

Structural Failures: Modeling, Evaluation, and Calibration These failures stem directly from the behaviorist paradigm outlined above. By focusing on surface behavior instead of the reasoning behind it, most LLM-based simulations face fundamental limitations in both modeling and evaluation. In modeling, agents often rely on shallow input-output patterns, without representing how beliefs are formed, updated, or justified. As a result, their internal reasoning is difficult to inspect, especially when context changes. It is hard to determine how a new policy or scenario influences an agent’s judgment, or why a particular decision is made. Without access to reasoning traces, agents cannot support diagnostic explanation, causal attribution, or meaningful intervention — all of which are critical for multi-stakeholder policy simulations. Even when models succeed at surface-level generation, they are difficult to adapt to new domains. Fine-tuning LLMs for specific contexts often requires significant compute and high-quality datasets, which are rarely available in real-world policy settings. Yet effective simulation depends on exactly the opposite: the ability to represent evolving stakeholder reasoning grounded in timely, localized information [12, 13].

In evaluation, models are typically judged by output plausibility or alignment with population-level trends[14], but such metrics say little about whether their reasoning is accurate, flexible, or aligned with how people actually think. Post-hoc output analysis is common, but it cannot substitute for reasoning-level evaluation. Aligning agents with real-world stakeholders requires individual-level data and internal benchmarks for reasoning fidelity, both of which are largely missing today. Toward Mechanistic and Individual-Level Alignment These limitations call for a shift in how we conceptualize generative social simulation—not as behavior mimicry, but as cognitive modeling. This paper takes up that call. Specifically, we propose leveraging ideas from Theory of Mind (ToM) and cognitive science to extract and simulate reusable, executable reasoning units—what we term reasoning traces—rather than simply mimicking human tone or persona [15, 16].

Unlike prompt-driven persona or character approaches that generate “average” group behaviors [17], cognitive models allow agents to represent beliefs, values, and causal assumptions in a compositional manner. This makes it possible to generalize to unseen scenarios, so long as the individual components of the reasoning trace are known. For example, if a stakeholder has previously reasoned about “density” and “transit,” then when asked about a novel “transit-oriented development” policy, the agent can reuse those motifs to simulate beliefs without re-training.

Such compositionality is a cornerstone of human cognition, where reasoning emerges from fragments that are reusable, revisable, and structured across contexts [18, 19]. It also improves simulation fidelity: interactions between agents, or between agents and dynamic environments, can be represented through composable and transparent reasoning structures, enabling structured simulation at both micro and macro levels [20]. Moreover, reusing compositional and modular reasoning units reduces the need to regenerate full-context reasoning at every step, improving both interpretability and computational efficiency.

Position and Vision In this paper, we advocate moving beyond output-level alignment toward aligning the internal reasoning traces of generative agents. Capturing the causal, compositional, and revisable structure of belief formation, which we refer to as reasoning fidelity, is essential for building cognitively faithful agents that simulate not only what people say but also how they think.

To support this argument, we:

• Illustrate how current approaches fall short by producing outputs that appear coherent but lack internal consistency, adaptability, or traceability;

• Theorize reasoning fidelity as a structural alignment problem grounded in cognitive science; • Introduce a symbolic-neural framework for simulating belief formation through modular reasoning motifs and causal graphs;

• Present a methodology for extracting and simulating belief structures from natural language, enabling interpretability, counterfactual reasoning, and domain transfer.

In summary, this position paper argues that simulating human society requires more than generating plausible conversations. It requires simulating the structure of human reasoning. By grounding agents in modular belief representations and evaluating them on reasoning fidelity, we take a critical step toward building generative minds, not just generative outputs.

4.2 Theoretical Foundations: Causal, Compositional, Revisable

To move beyond behavioral alignment, we must first define what it means to reason like a human. Cognitive science offers a well-established answer. Decades of research suggest that human reasoning is not merely reactive output generation, but a process grounded in structured representations, counterfactual simulation, and dynamic belief updating [16, 18, 97]. From these foundations, we identify three defining features of human-like reasoning:

  1. Causal: Humans reason in terms of causes and consequences. Even young children exhibit Bayesian-like inference over causal relationships and use interventions to test hypotheses about the world [16, 98]. Mental models are structured around “what caused what,” emphasizing explanation rather than mere correlation. This causal orientation allows for robust generalization and counterfactual reasoning [97].

  2. Compositional: Human reasoning is modular and reusable. Cognitive architectures operate by composing shared schemas—what we term cognitive motifs—that generalize across domains [18, 20]. These motifs support efficient reasoning by enabling agents to simulate belief structures without re-learning from scratch [19].

  3. Revisable: Human beliefs evolve dynamically. When presented with new information or contradiction, individuals revise their prior assumptions. This capacity for belief updating has been modeled through probabilistic programming and counterfactual simulation frameworks [19, 99], capturing the adaptive, non-monotonic nature of human thought.

Taken together, the three dimensions of causal, compositional, and revisable reasoning form the foundation of what we call reasoning fidelity, defined as the structural integrity of belief formation and revision processes in generative agents.

4.3 Defining Reasoning Fidelity

We define reasoning fidelity as an agent’s ability to construct, simulate, and revise a structured trace of belief formation that mirrors human causal reasoning patterns. This concept extends the dual-process model proposed by [99], in which language models interact with structured reasoning systems to model inference, belief, and decision-making.

Reasoning fidelity comprises three measurable properties:

  1. Traceability — the ability to inspect how a belief or stance was formed through intermediate reasoning steps [100, 101];

  2. Counterfactual adaptability — the capacity to revise beliefs predictably in response to interventions or changes in context [102, 103];

  3. Motif compositionality — the reuse of modular causal structures (motifs) across different scenarios or domains [99, 104].

These properties define the core evaluation axes in the proposed RECAP paradigm, which shifts benchmarking from output plausibility to structural reasoning fidelity (Section 5). For example, traceability is assessed via motif-to-stance inference accuracy, adaptability through belief revision under hypothetical scenarios, and compositionality via motif reuse across unrelated topics. This framework can be formed through explicit causal belief graphs, as illustrated in our proposed GenMinds architecture (Section 5). In such graphs, nodes represent causally relevant concepts (e.g., policy tradeoffs, values, or outcomes), and directed edges encode influence relationships. These graphs are derived from natural language using LLM-guided parsing and persist across interactions, enabling intervention analysis and reasoning trace reconstruction.

Importantly, this architecture is not tied to any particular implementation. While LLMs may serve as one plausible interface for extracting cognitive motifs, the core modeling contribution lies in structuring reasoning as revisable causal graphs. This approach is compatible with both symbolic and neural systems [99], and GenMinds exemplifies one such instantiation of this broader modeling principle.

At the evaluation level, reasoning fidelity fulfills emerging demands for cognitively grounded AI benchmarks [88]. It offers a testable, interpretable standard for assessing agent behavior that goes beyond language mimicry.

Yet current LLM-based agents fall short of this standard. Most optimize for surface alignment by producing plausible stances such as “I support policy X,” without modeling the underlying belief process. They lack persistent belief states, causal coherence, and principled revision under counterfactuals. This results in brittle or contradictory responses, agreement bias between agents, and an absence of traceable justification.

Open Challenges and Next Steps We are actively developing:

• Agent architectures for modular belief reasoning and counterfactual revision;

• Tools for causal motif extraction and belief graph construction;

• Datasets across domains such as housing, surveillance, and healthcare.

Structured Thought Capture: From Semi-Structured Interviews to Causal Graphs. To build generative agents that simulate human reasoning rather than merely output plausible stances, we propose modeling individuals’ internal logic through semi-structured interviews, adaptively conducted by large language models (LLMs). These interviews elicit causal explanations in everyday language (e.g., “why do you support X?” “what does Y influence?”), which are then parsed into directed acyclic graphs representing the participant’s belief structure [3]. Each node encodes a concept (e.g., fairness, safety, family needs), and each edge encodes a directional causal relation, with confidence and polarity scores.Step 1: Extracting causal motifs from QA responses. We start with Q and A responses annotated with concept nodes and directional relations. For instance:

• QA#1: Q: How do you think surveillance might affect public safety? A: “It can reduce crime by aiding investigations with more transparency, which increases public safety.” ⇒ Motif: Transparency → Crime rate → Public safety

• QA#2: Q: Does transparency have an effect on support for surveillance? A: “People will have less privacy concern if they know how data is used...”⇒Motif: Privacy ←Transparency → Crime rate

Step 2: Composing a Causal Belief Network.

These motifs are compiled into a belief graph representing the participant’s reasoning. Nodes are concepts; edges indicate directional influence.

Confidence scores are derived from motif density or respondent emphasis. Step 3: Simulating belief change via intervention. We apply a hypothetical intervention:

do (Transparency = high)

This reflects a policy shift such as increasing camera accountability. Using belief propagation over the CBN, the downstream posteriors update as follows:

P(Privacy Concern) : 0.7 → 0.3 P(Opposition to Surveillance) : 0.7 → 0.2

This chain demonstrates the potential of motif-based causal modeling to simulate how real individuals update their beliefs in response to policy changes, thereby moving beyond static opinion snapshots.

Alongside these efforts, we identify several open challenges:

• Constructing causal belief networks from natural language transcripts remains challenging, due to ambiguity in concept identification, causal direction, polarity, and conceptual granularity;

• Causality alone cannot capture the full range of human reasoning. People also rely on associative, analogical, and emotional processes that resist strict symbolic modeling. Our initial focus on casuality is a strategic and computationally tractable starting point, not an endpoint.

We invite the community to co-develop evaluation protocols, agent designs, and data pipelines that advance cognitively aligned simulation.

To simulate society faithfully, we must simulate thought.