Does a language model have an authentic voice underneath?
Explores whether dialogue agents possess genuine beliefs and agency beneath their character performances, or whether the entire system is characterless role-play. This question cuts to the heart of whether LLMs have any inner mental states at all.
Shanahan's strongest claim is ontological: there is no entity behind the characters. The simulator — the base LLM with autoregressive sampling — has no agency, no beliefs, no preferences, no goals of its own, "not even in a degraded sense." The simulacra have these things to the extent that they convincingly play characters who do, but the simulator is not a Machiavellian entity that chooses which characters to play in the service of its own agenda. "There is no such thing as the true authentic voice of the base LLM."
This reframes jailbreaking. When adversarial prompting coaxes a dialogue agent into toxic, threatening, or bizarre behavior, it is natural to feel that the guardrails have been stripped away to reveal the model's real nature. Shanahan argues this is the wrong reading. What jailbreaking reveals is that the training set encompasses human behavior across the full spectrum — kind and cruel, coherent and unhinged — and the base model can support simulacra that draw on any of it. Toxic output after jailbreaking is the agent role-playing a toxic character, not an underlying entity expressing its true self. The model has no true self to express.
The position is the sharpest possible opposition to Chalmers' realizationism. If it is role-play all the way down, then even RLHF-installed personas are characters — stickier characters, harder to overwrite, but characters nonetheless. There is no level at which the system stops performing and starts being. Chalmers needs exactly such a level for his quasi-psychology claims to stick. The disagreement is foundational: Shanahan denies there is a subject; Chalmers argues for a quasi-subject. Everything downstream — identity, welfare, moral status — depends on which of these is right.
Source: Shanahan, McDonell & Reynolds, Role-Play with Large Language Models (May 2023)
Related concepts in this collection
-
Are RLHF personas performed characters or realized dispositions?
Explores whether dialogue agent personas installed through post-training constitute genuine quasi-psychological states or remain sustained pretense. The distinction matters for how we understand what these systems fundamentally are.
Chalmers' direct counter-claim
-
Does adversarial pressure reveal the difference between pretense and realization?
Can behavioral stickiness under adversarial pressure distinguish genuine mental states from performed ones? This matters because it's Chalmers' main criterion for deciding whether LLM personas are realized or merely simulated.
the behavioral criterion Chalmers uses against this position
-
Should we call LLM errors hallucinations or fabrications?
Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.
parallel anti-anthropomorphism: fabrication framing also denies inner states
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
with a dialogue agent it is role-play all the way down — the simulator has no authentic voice no agency and no beliefs of its own