Does an LLM commit to a single character or maintain many?
Explores whether language models lock into one personality or instead hold multiple consistent characters in a probability distribution that narrows over time. Matters because it changes how we interpret apparent inconsistencies in model behavior.
The simple role-play metaphor — one actor, one part — is too rigid for what LLMs actually do. Shanahan refines it using Janus's simulator framing: the LLM is a non-deterministic simulator capable of generating an infinity of characters (simulacra), and at any point during a conversation it maintains a superposition of simulacra consistent with the preceding context. The superposition narrows as the conversation proceeds: each new turn rules out characters inconsistent with what has been said, concentrating probability on an ever-smaller set.
The distributional view is more than a refinement — it changes the ontological picture. Under simple role-play, there is one character the system is playing, and the question is what that character's properties are. Under the superposition view, there is no single character until the conversation has proceeded far enough to collapse the distribution to near-determinacy. The system is simultaneously consistent with many characters, and the character that appears in any particular generation is a sample from the current distribution, not a reveal of a committed identity.
This explains observable phenomena that the single-character view cannot. When a user regenerates the model's output, the second generation may present a meaningfully different personality, stance, or knowledge state — while remaining consistent with the conversation so far. The system did not change its mind; it sampled a different point from the distribution. The 20-questions test formalizes this: the agent never "thought of" an object; it maintained a set of objects consistent with prior answers and generated one on the fly at the reveal, and will generate a different consistent one if asked again.
Source: Shanahan, McDonell & Reynolds, Role-Play with Large Language Models (May 2023); drawing on Janus (2022)
Related concepts in this collection
-
Do large language models actually commit to a single character?
Explores whether LLMs pick and hold a fixed character or instead sample from multiple consistent possibilities. Tests reveal that regenerated responses differ while remaining consistent with context, challenging intuitive assumptions about how dialogue agents work.
the empirical demonstration of superposition
-
Should we treat dialogue agents as role-playing characters?
Does the role-play framing successfully avoid anthropomorphism while preserving folk-psychological vocabulary for describing LLM behavior? This matters because it shapes whether we attribute genuine mental states to dialogue systems.
the simple role-play view this refines
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
an LLM is a non-deterministic simulator that maintains a superposition of simulacra rather than committing to a single character