Why do LLMs fail when simulating agents with private information?
Explores whether single-model control of all social participants masks fundamental limitations in how LLMs handle information asymmetry and genuine uncertainty about others' knowledge.
Most LLM social simulations use a single model to generate all participants — an omniscient perspective fundamentally at odds with how real social interaction works. When evaluated against non-omniscient settings that preserve information asymmetry, LLMs struggle.
The "Is this the real life?" evaluation framework (2024) demonstrates this by comparing omniscient simulation (one LLM controls all parties) against non-omniscient simulation (separate LLM instances with private information). The performance gap is systematic: models that appear socially competent in omniscient mode fail when they must reason under genuine uncertainty about what the other party knows, wants, or intends.
This matters because real social interaction is defined by information asymmetry. In SOTOPIA's scenarios, agents have shared context but private goals — "Your goal is to buy the chair for $80" is visible only to the buyer. The Secret dimension (what agents must hide) directly requires information management that omniscient models bypass entirely.
The implication for persona simulation research is direct. Since Can AI agents learn people better from interviews than surveys?, simulation fidelity appears high. But if that fidelity was measured under omniscient conditions, it overstates real-world applicability. Since Do language models actually build shared understanding in conversation?, the failure under information asymmetry is predictable: models that skip grounding work will fail precisely when grounding is most needed — when parties have genuinely different information states.
Since Why do language models skip the calibration step?, non-omniscient simulation demands the dynamic grounding that LLMs systematically lack.
Source: Social Theory Society
Related concepts in this collection
-
Do language models actually build shared understanding in conversation?
When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
the mechanism: omniscient simulation lets models skip grounding work entirely
-
Why do language models skip the calibration step?
Current LLMs assume shared understanding rather than building it through dialogue. This explores why that design choice persists and what breaks when it fails.
non-omniscient settings demand the dynamic mode
-
Can AI agents learn people better from interviews than surveys?
Can rich interview transcripts seed more accurate generative agents than demographic data or survey responses? This matters because it challenges how we build digital simulations of real people.
simulation fidelity may overstate real-world capacity if measured under omniscient conditions
-
How do we generate realistic personas at population scale?
Current LLM-based persona generation relies on ad hoc methods that fail to capture real-world population distributions. The challenge is reconstructing the joint correlations between demographic, psychographic, and behavioral attributes from fragmented data.
another mechanism producing simulation overconfidence
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
omniscient social simulation fails under real-world information asymmetry because single-model control eliminates distributed cognition