Why don't LLM role-playing agents act on their stated beliefs?

When LLMs articulate what a persona would do in the Trust Game, their simulated actions contradict those stated beliefs. This explores whether the gap reflects deeper inconsistencies in how language models apply knowledge to behavior.

Note · 2026-03-27 · sourced from Role Play

Using the Trust Game as a behavioral benchmark, researchers found systematic inconsistencies between LLMs' stated beliefs about how personas would behave and the actual outcomes of their role-playing simulation — at both individual and population levels. Even when models appear to encode plausible beliefs, they fail to apply them consistently.

Key findings: explicit task context during belief elicitation does not improve consistency; self-conditioning enhances alignment in some models; imposed priors tend to undermine rather than improve consistency; and individual-level forecasting accuracy degrades over longer horizons. In-context prompting may struggle to override entrenched model priors, limiting researchers' ability to test alternative theories or correct biases.

This connects to the knowing-doing gap documented elsewhere in the vault. Since Can language models understand without actually executing correctly?, the belief-behavior inconsistency in role-playing is a social-cognitive instance of the same split-brain phenomenon: the model can articulate what a persona would do without being able to enact it. And since Do personas make language models reason like biased humans?, the failure of imposed priors to improve consistency suggests that persona beliefs are not controllable through prompting alone.

Source: Role Play Paper: Do Role-Playing Agents Practice What They Preach?

Related concepts in this collection

Can language models understand without actually executing correctly? Do LLMs truly comprehend problem-solving principles if they consistently fail to apply them? This explores whether the gap between articulate explanations and failed actions points to a fundamental architectural limitation.
belief-behavior inconsistency as social-cognitive split-brain
Do personas make language models reason like biased humans? When LLMs are assigned personas, do they develop the same identity-driven reasoning biases that humans exhibit? And can standard debiasing techniques counteract these effects?
imposed priors fail to override entrenched model priors

Concept map

14 direct connections · 139 in 2-hop network ·dense cluster

Why don't LLM role-playing agents act on their s… Can language models understand without actually ex… Do personas make language models reason like biase…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

LLM role-playing agents show systematic belief-behavior inconsistency — stated beliefs fail to predict simulated actions even when beliefs appear plausible