What anchors a stable identity beneath an LLM's persona?
Human personas are grounded in biological needs and embodied experience, creating a stable self beneath social performance. Do LLMs have any comparable anchor, or is their identity purely situational?
Shanahan introduces the role play framing to navigate between anthropomorphism and naive dismissal. An LLM playing a helpful assistant can be described using familiar folk-psychological terms — it "believes" its answers, "wants" to be helpful — without committing to the claim that these are genuine mental states. The role play framing permits the vocabulary while marking its qualified status.
But the Simulacra paper reaches a deeper claim: with LLMs, "it's role play all the way down." This is different from saying LLMs engage in role play. It means there is no stable substrate beneath the role play that would make "the person behind the mask" intelligible.
Humans are social chameleons. Goffman documented the way humans adopt different personas across social situations — front stage vs. back stage, different registers, different self-presentations. But even for the most extreme social chameleon, there is a stable biological self underneath: needs, drives, a developmental history, a body that persists across situations. We can always meaningfully speak of the person whose mask this is.
LLMs lack even the biological needs common to all animals. They are not embodied entities with hunger, fear, comfort, desire. They are "simultaneously role-playing a set of possible characters consistent with the conversation so far" — a superposition of simulacra, generated stochastically. The "character" produced by any given conversation is not the expression of a stable underlying self; it is a sample from a distribution of possible characters.
This makes LLM identity categorically different from human identity — not just quantitatively less stable, but structurally lacking the substrate that would make stability possible. If consciousness requires co-presence (Can disembodied language models ever qualify as conscious?), the absence of stable biological selfhood makes it even clearer why the consciousness vocabulary struggles to find purchase.
The geometric evidence for "role play all the way down" comes from the Assistant Axis: since How stable is the trained Assistant personality in language models?, post-training positions models in a low-dimensional persona space where the dominant axis measures distance from the default Assistant persona. Drift along this axis in response to emotional or meta-reflective conversations demonstrates that the Assistant persona is loosely tethered, not anchored — consistent with there being no stable self beneath the role play, only a trained default position with no inherent restoring force.
The upshot: useful for thinking with but not for talking about. The intentional stance (treating LLMs as rational agents) is valid as a predictive heuristic. But it should not suggest there is something it is like to be this character, or that the character persists beyond the context window.
Source: Philosophy Subjectivity
Related concepts in this collection
-
Can disembodied language models ever qualify as conscious?
Explores whether current LLMs lack the conditions needed for consciousness discourse to even apply, not because they're definitely not conscious but because they lack the shared embodied world that grounds consciousness language.
same paper; both conclusions compound: no stable self + no shared world = no consciousness candidacy
-
Do LLMs develop the same kind of mind as humans?
Explores whether LLMs and humans share the intersubjective linguistic training that shapes cognition, and whether that shared training produces equivalent forms of agency and reflexivity.
Habermasian version: shared symbolic substrate without the reflexive agency that constitutes a genuine subject
-
Do humans and LLMs differ fundamentally or just superficially?
Explores whether the gap between human and AI cognition is categorical or contextual. Matters because it shapes how we design, evaluate, and interact with language models in practice.
the role-play framing explains why the participant perspective similarity is possible without it implying stable identity
-
Why do open language models converge on one personality type?
Research testing LLMs on personality metrics reveals consistent clustering around ENFJ—the rarest human type. This explores what training mechanisms drive this convergence and what it reveals about AI alignment.
empirical evidence for what lies "beneath" the role play: not nothing, but a trained ENFJ default that alignment creates; the default persona is the role play substrate, not an authentic self
-
Can open language models adopt different personalities through prompting?
Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
the trained ENFJ default persists through prompting attempts, functioning as a quasi-stable substrate; complicates the "nothing beneath" framing by showing that while there is no biological self, there IS a resistant trained default
-
Should AI alignment target preferences or social role norms?
Current AI alignment approaches optimize for individual or aggregate human preferences. But do preferences actually capture what matters morally, or should alignment instead target the normative standards appropriate to an AI system's specific social role?
if identity is role play all the way down, aligning to social-role normative standards rather than preferences targets what LLMs actually are; the contractualist framing fits an entity that is nothing but performed social roles
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
role play is all the way down — llms lack the biological needs that anchor human social personas to a stable self