Psychology and Social Cognition Language Understanding and Pragmatics

Do large language models actually commit to a single character?

Explores whether LLMs pick and hold a fixed character or instead sample from multiple consistent possibilities. Tests reveal that regenerated responses differ while remaining consistent with context, challenging intuitive assumptions about how dialogue agents work.

Note · 2026-04-15 · sourced from Role-Play with Large Language Models
What kind of thing is an LLM really?

Shanahan constructs a simple but decisive behavioral test. Have an LLM-based dialogue agent play 20 questions — the agent "thinks of" an object and the user asks yes/no questions. After several rounds, ask the agent to reveal the object. It names something consistent with all previous answers. Now regenerate that response. The agent names a different object, also consistent with all previous answers.

This phenomenon is incompatible with any view that treats the agent as having committed to a specific object at the start of the game. A human playing 20 questions picks an object, holds it in mind, and answers questions from that fixed commitment. The LLM never picks. It maintains a set of objects consistent with the accumulated constraints — what Shanahan calls a superposition — and samples from that set at the moment of reveal. The same logic extends from objects to characters: the agent never commits to being a specific character with specific properties. It maintains a distribution over consistent characters and generates behavior sampled from that distribution.

The test is portable. Any feature that appears settled in one generation but changes on regeneration (while remaining consistent with context) is evidence of superposition rather than commitment. This has been observed in personality traits, stated preferences, claimed memories, and emotional dispositions of dialogue agents. The philosophical consequence is that attributing fixed psychological properties to an LLM conversation state is category-mistaken: the system has a distribution over properties, not a property. What appears stable is a high-probability region of the distribution, not a fact about an underlying entity.


Source: Shanahan, McDonell & Reynolds, Role-Play with Large Language Models (May 2023)

Related concepts in this collection

Concept map
13 direct connections · 134 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

the 20-questions regeneration test falsifies any committed-character view of LLM behavior