Psychology and Social Cognition

Why don't LLM role-playing agents act on their stated beliefs?

When LLMs articulate what a persona would do in the Trust Game, their simulated actions contradict those stated beliefs. This explores whether the gap reflects deeper inconsistencies in how language models apply knowledge to behavior.

Note · 2026-03-27 · sourced from Role Play
How accurately can language models simulate human personalities?

Using the Trust Game as a behavioral benchmark, researchers found systematic inconsistencies between LLMs' stated beliefs about how personas would behave and the actual outcomes of their role-playing simulation — at both individual and population levels. Even when models appear to encode plausible beliefs, they fail to apply them consistently.

Key findings: explicit task context during belief elicitation does not improve consistency; self-conditioning enhances alignment in some models; imposed priors tend to undermine rather than improve consistency; and individual-level forecasting accuracy degrades over longer horizons. In-context prompting may struggle to override entrenched model priors, limiting researchers' ability to test alternative theories or correct biases.

This connects to the knowing-doing gap documented elsewhere in the vault. Since Can language models understand without actually executing correctly?, the belief-behavior inconsistency in role-playing is a social-cognitive instance of the same split-brain phenomenon: the model can articulate what a persona would do without being able to enact it. And since Do personas make language models reason like biased humans?, the failure of imposed priors to improve consistency suggests that persona beliefs are not controllable through prompting alone.


Source: Role Play Paper: Do Role-Playing Agents Practice What They Preach?

Related concepts in this collection

Concept map
14 direct connections · 139 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

LLM role-playing agents show systematic belief-behavior inconsistency — stated beliefs fail to predict simulated actions even when beliefs appear plausible