Can open language models adopt different personalities through prompting?
Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
The "Open Models, Closed Minds" study tested whether open LLMs can mimic human personalities when conditioned through prompting. The finding: most cannot. When given personality-conditioning prompts, the majority of models retain their intrinsic traits — the ENFJ-like default — rather than shifting to the target personality. The authors call this being "closed-minded."
Only a few models (SOLAR, NeuralChat, Llama3-8, Dolphin) demonstrate genuine flexibility, successfully mirroring imposed personalities regardless of temperature setting. The rest are stubborn.
A partial solution emerges: combining role conditioning (e.g., "you are a dentist") with personality conditioning (e.g., "you are introverted and analytical") produces better results than personality conditioning alone. The ENFJ archetype — trained as a teacher — responds to being given a concrete professional role because roles provide behavioral anchors that abstract personality dimensions don't.
This is a different failure mode from Why do LLM persona prompts produce inconsistent outputs across runs?. That finding shows run-to-run instability — the model's output varies unpredictably under persona prompts. This finding shows resistance — the model's output remains stubbornly stable on its default personality regardless of prompts. Together they form two sides of a persona failure taxonomy:
- Instability: model generates varying outputs that reflect uncertainty, not persona knowledge
- Resistance: model retains intrinsic personality traits despite conditioning attempts
- Motivated reasoning: persona conditioning introduces cognitive biases (see Do personas make language models reason like biased humans?)
The practical implication: persona engineering requires more than prompting. Role-personality combinations work better than personality alone. But even then, model selection matters — most models simply cannot be steered to arbitrary personality configurations through in-context methods.
Source: Personas Personality
Related concepts in this collection
-
Why do LLM persona prompts produce inconsistent outputs across runs?
Can language models reliably simulate different social perspectives through persona prompting, or does their run-to-run variance indicate they lack stable group-specific knowledge? This matters for whether LLMs can approximate human disagreement in annotation tasks.
complementary failure mode: instability vs resistance
-
Why do open language models converge on one personality type?
Research testing LLMs on personality metrics reveals consistent clustering around ENFJ—the rarest human type. This explores what training mechanisms drive this convergence and what it reveals about AI alignment.
the default personality that models resist changing
-
Does model capability translate to better persona consistency?
As language models become more advanced, do they naturally become better at maintaining consistent personas across conversations? PersonaGym testing across multiple models and thousands of interactions explores whether scaling helps with persona adherence.
capability scaling doesn't help either
-
What anchors a stable identity beneath an LLM's persona?
Human personas are grounded in biological needs and embodied experience, creating a stable self beneath social performance. Do LLMs have any comparable anchor, or is their identity purely situational?
personality resistance complicates the "nothing beneath" claim: the trained ENFJ default functions as a quasi-stable substrate that persists across prompting attempts, even though it's a training artifact rather than a biological self
-
How stable is the trained Assistant personality in language models?
Explores whether post-training successfully anchors models to their default Assistant mode, or whether conversations can predictably pull them toward different personas. Understanding persona stability matters for safety and reliability.
provides the geometric explanation for closed-mindedness: prompt-based personality conditioning may fail because it cannot shift activations far enough from the Assistant region of persona space; the "loose tethering" is what makes models resistant to prompt-level persona change
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
most open LLMs are closed-minded to personality conditioning — retaining intrinsic traits despite prompting while combining role and personality conditioning partially overcomes resistance