Do large language models actually commit to a single character?
Explores whether LLMs pick and hold a fixed character or instead sample from multiple consistent possibilities. Tests reveal that regenerated responses differ while remaining consistent with context, challenging intuitive assumptions about how dialogue agents work.
Shanahan constructs a simple but decisive behavioral test. Have an LLM-based dialogue agent play 20 questions — the agent "thinks of" an object and the user asks yes/no questions. After several rounds, ask the agent to reveal the object. It names something consistent with all previous answers. Now regenerate that response. The agent names a different object, also consistent with all previous answers.
This phenomenon is incompatible with any view that treats the agent as having committed to a specific object at the start of the game. A human playing 20 questions picks an object, holds it in mind, and answers questions from that fixed commitment. The LLM never picks. It maintains a set of objects consistent with the accumulated constraints — what Shanahan calls a superposition — and samples from that set at the moment of reveal. The same logic extends from objects to characters: the agent never commits to being a specific character with specific properties. It maintains a distribution over consistent characters and generates behavior sampled from that distribution.
The test is portable. Any feature that appears settled in one generation but changes on regeneration (while remaining consistent with context) is evidence of superposition rather than commitment. This has been observed in personality traits, stated preferences, claimed memories, and emotional dispositions of dialogue agents. The philosophical consequence is that attributing fixed psychological properties to an LLM conversation state is category-mistaken: the system has a distribution over properties, not a property. What appears stable is a high-probability region of the distribution, not a fact about an underlying entity.
Source: Shanahan, McDonell & Reynolds, Role-Play with Large Language Models (May 2023)
Related concepts in this collection
-
Does an LLM commit to a single character or maintain many?
Explores whether language models lock into one personality or instead hold multiple consistent characters in a probability distribution that narrows over time. Matters because it changes how we interpret apparent inconsistencies in model behavior.
the theoretical claim this test supports
-
Should we call LLM errors hallucinations or fabrications?
Does the language we use to describe LLM failures shape the technical solutions we build? Examining whether perceptual and psychological frameworks misdiagnose what's actually happening.
parallel: output is produced at generation time, not retrieved from a stored state
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
the 20-questions regeneration test falsifies any committed-character view of LLM behavior