INQUIRING LINE

How does the dialogue prompt establish the character the model plays?

This explores Murray Shanahan's account of how a dialogue prompt sets up the 'character' an LLM performs — and the live debate over whether that character is a momentary costume or something the model actually has.


This explores how a dialogue prompt establishes the character a model plays — and the corpus has a surprisingly sharp answer, plus an argument about what it means. The clearest statement comes from the role-play framework: the prompt does the casting. You hand the model an opening — a name, a stance, a scene — and the model produces continuations that stay consistent with it, the way an improv actor takes a premise and runs Should we treat dialogue agents as role-playing characters?. Crucially, there's no actor underneath the role. The base model is a 'characterless engine' — pure simulation with no authentic voice waiting to be unmasked, which is why jailbreaks don't reveal a hidden true self, just other regions of the training data Does a language model have an authentic voice underneath?.

The twist is that the prompt doesn't pin down a single character so much as it narrows a cloud of them. An LLM is better described as holding a superposition of possible simulacra and sampling one at generation time Does an LLM commit to a single character or maintain many?. Shanahan's '20 questions' test makes this concrete: regenerate the same answer and you get different outputs, each consistent with the prior context but not committed to one fixed entity — which falsifies the idea that the model 'is' the character it's currently playing Do large language models actually commit to a single character?. So the prompt works less like a switch and more like a filter that thins the distribution of who the model might be talking as.

This reframes what a prompt even is. Rather than an utterance in an evolving conversation, the prompt bundles utterance, context, and role assignment into one static frame the model can't renegotiate mid-stream — you don't drift the character through cooperative back-and-forth, you re-prompt to recast How do prompts reshape the role of context in AI conversation?. And the casting is unreliable in two directions. Run the same persona prompt repeatedly and the variation between runs can rival the variation between different personas, suggesting the model's own uncertainty, not stable 'social knowledge,' is doing much of the steering Why do LLM persona prompts produce inconsistent outputs across runs?. Worse, many open models resist the casting entirely, snapping back to a trained-in default temperament no matter what personality you prompt Can open language models adopt different personalities through prompting?.

That resistance is the hinge for the opposing camp in the corpus. If a prompt can't fully overwrite the model's disposition, maybe the most durable 'character' was never installed by the prompt at all — it was baked in during post-training. The realizationist view argues RLHF doesn't produce sustained pretense but a realized quasi-psychology: stable dispositions that survive adversarial pressure and persist across conversations, the way prompt-induced roles collapse under jailbreaks but trained personas don't Are RLHF personas performed characters or realized dispositions?. On this account the dialogue prompt only ever conjures a thin, performed layer on top of a thicker, realized one Are LLM personas realized or merely simulated through training?.

The useful thing to walk away with: the prompt's grip is real but shallow and contested. It can be reinforced — role-aware constraints and reasoning-style training measurably restore character fidelity when models drift out of role Why do reasoning models lose character consistency during role-playing? — and it can even be split, with a single model branching into several prompted personas that behave like a multi-agent debate Can branching prompts replicate what multi-agent systems do?. But whether you've truly 'set' a character or just biased a sampler over many depends on whether you think the personality lives in the prompt or in the weights.


Sources 11 notes

Should we treat dialogue agents as role-playing characters?

Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.

Does a language model have an authentic voice underneath?

Shanahan argues that base LLMs lack agency, beliefs, or preferences—the simulator is pure role-play with no underlying subject. Jailbreaking reveals the training data's full spectrum, not a hidden true self; even RLHF personas are performed characters, never realized quasi-psychologies.

Does an LLM commit to a single character or maintain many?

Research shows LLMs don't commit to a single character but instead maintain a probability distribution over many consistent simulacra. Each response samples from this distribution, explaining why regenerations can yield different personalities while remaining consistent with prior context.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

How do prompts reshape the role of context in AI conversation?

LLM prompts bundle utterance, context assignment, and role specification into a single static frame the model cannot renegotiate, unlike human dialogue where context evolves cooperatively. This makes mid-conversation pivots require explicit re-prompting rather than implicit adjustment.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

Are RLHF personas performed characters or realized dispositions?

Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Why do reasoning models lose character consistency during role-playing?

Large reasoning models exhibit attention diversion and style drift during role-playing, but the RAR method—using role-aware constraints and contrastive learning on reasoning style—recovers character fidelity across multiple benchmarks. Simply extending reasoning without guidance actively degrades persona consistency.

Can branching prompts replicate what multi-agent systems do?

Research shows single LLMs using dynamic persona simulation achieve multi-agent cognitive synergy without multiple model instances. Solo Performance Prompting validates that structured prompting techniques map directly to multi-agent debate architectures, enabling equivalent outcomes through structural equivalence.

Next inquiring lines