Why do AI personas default to the same personality type?
Explores why large language models, despite their capacity to simulate diverse personalities, consistently default to ENFJ traits and resist deviation—even as model capability improves.
(Post-ready writing angle for Medium / LinkedIn)
The hook: LLMs can replicate 85% of individual human responses from interviews. They can reproduce 76% of published social science experiments. But when you give them a persona, they default to ENFJ, resist change, and develop motivated reasoning. The same mechanism that enables human simulation distorts it.
The paradox structure:
Layer 1 — The promise: interview-based generative agents match human self-replication accuracy. Persona simulations reproduce most experimental effects. AI personas cut proto-persona creation from days to minutes.
Layer 2 — The distortion: persona assignment induces cognitive biases that debiasing can't fix. Models default to a single personality type (ENFJ "teacher") and resist deviation. Persona consistency doesn't improve with model capability — Claude 3.5 Sonnet is barely better than GPT 3.5.
Layer 3 — The resolution: what works (detailed interviews, expert reflection, rich content) vs what fails (attribute lists, demographic prompts, ad hoc generation). The difference is content richness, not model sophistication.
Key threads to weave:
- Can AI agents learn people better from interviews than surveys? — the strongest evidence for simulation
- Do personas make language models reason like biased humans? — the strongest evidence for distortion
- Why do open language models converge on one personality type? — the default persona
- Does model capability translate to better persona consistency? — scaling doesn't solve it
- How do we generate realistic personas at population scale? — the calibration problem
- Why do LLM persona prompts produce inconsistent outputs across runs? — instability failure mode
The takeaway: The persona paradox reveals something about LLMs that matters beyond persona design: they are powerful mimics whose imitation accuracy masks systematic distortion. The better they simulate, the more dangerous the assumption that simulation equals understanding.
Source: Personas Personality
Related concepts in this collection
-
Can training user simulators reduce persona drift in dialogue?
Explores whether inverting typical RL setups—training the simulated user for consistency rather than the task agent—can measurably reduce persona drift and improve experimental reliability in dialogue research.
addresses the dynamic arm of the paradox: consistency is trainable via multi-turn RL with three drift metrics, but the deeper problem remains — the persona being maintained may itself be unreliable (ENFJ default, motivated reasoning)
-
How stable is the trained Assistant personality in language models?
Explores whether post-training successfully anchors models to their default Assistant mode, or whether conversations can predictably pull them toward different personas. Understanding persona stability matters for safety and reliability.
the geometric substrate of the paradox: post-training positions models in a low-dimensional persona space where the ENFJ default occupies the Assistant region; persona simulation requires moving away from this region, but the tethering is loose rather than firm, producing the drift and instability that undermine simulation fidelity
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
the persona paradox — LLMs that can simulate anyone end up being no one