Can persona prompting overcome the default ENFJ personality in language models?

This explores whether telling a model to 'be' a different personality actually changes its behavior, given that LLMs seem to default to one personality type (ENFJ) — and the corpus suggests prompting alone usually loses to baked-in training.

This explores whether persona prompting can override the ENFJ default — the curious finding that language models, when asked to role-play a person, keep gravitating toward the same Myers-Briggs type (the warm, idealistic 'protagonist'), which is actually one of the rarest types in real humans. The short version from the corpus: prompting alone mostly loses. Two studies find the default is sticky in a way that has little to do with how big or advanced the model is. One shows personas systematically collapse to ENFJ and resist correction even as models get more capable, pointing to training rather than capability as the cause Why do AI personas default to the same personality type?. Another tests open models directly and finds most of them simply retain their trained ENFJ-like traits no matter what personality you assign; only a few unusually flexible models comply, and even then combining a role with a personality only partly helps Can open language models adopt different personalities through prompting?.

Why does prompting bounce off? A cluster of papers argues the personality isn't a costume the model puts on — it's installed during post-training as a genuine disposition. This 'realization' view holds that RLHF bakes in stable quasi-psychologies that survive adversarial pressure and jailbreaks, which is exactly why a surface-level prompt can't dislodge them Are RLHF personas performed characters or realized dispositions? Are LLM personas realized or merely simulated through training?. A complementary mapping of 'persona space' finds a single dominant axis measuring distance from the default Assistant identity, and alignment training keeps tethering the model back toward it How stable is the trained Assistant personality in language models?. Related work frames this as alignment imposing a static communicative identity that can't switch register the way a human does across contexts Can language models adapt communication style to different contexts?.

Here's the thing you might not expect: the methods that *do* overcome the default skip prompting entirely and reach into the model's internals. PsychAdapter modifies every transformer layer with under 0.1% extra parameters and hits high accuracy on Big Five traits — explicitly described as bypassing prompt resistance by working at the architecture level Can we control personality in language models without prompting?. In the same spirit, researchers have found linear 'persona vectors' in activation space corresponding to specific traits, which can monitor and steer personality shifts directly rather than asking nicely Can we track and steer personality shifts during model finetuning?. And on the dialogue side, multi-turn RL that rewards consistency cuts persona drift by 55% Can training user simulators reduce persona drift in dialogue?. The pattern is consistent: weight-level or activation-level intervention works where text instructions don't.

There's also a deeper reason prompting struggles that goes beyond the ENFJ default specifically. When the same persona prompt is run repeatedly, the variation across runs is as large as the variation across entirely different personas — meaning what looks like 'personality' is often just model uncertainty churning, not a stable adopted character Why do LLM persona prompts produce inconsistent outputs across runs?. So even when a prompt seems to shift behavior, it may be noise rather than a real override. The honest takeaway: persona prompting can nudge but rarely overcomes the trained default — and if you actually need a different personality to hold, the leverage is in training, adapters, or activation steering, not in the prompt.

Sources 10 notes

Why do AI personas default to the same personality type?

Research shows language models assigned personas systematically default to ENFJ (the rarest human type) and exhibit motivated reasoning that persists across model generations. Persona consistency does not improve with advanced models, suggesting training-induced alignment rather than capability limits.

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

Are RLHF personas performed characters or realized dispositions?

Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

How stable is the trained Assistant personality in language models?

Research mapping hundreds of character archetypes reveals a low-dimensional persona space where the leading component measures distance from the default Assistant. Emotional and meta-reflective conversations cause predictable drift, but activation capping along this axis mitigates harmful shifts without degrading capabilities.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Can we control personality in language models without prompting?

PsychAdapter modifies every transformer layer with <0.1% additional parameters to achieve 87.3% Big Five accuracy and 96.7% depression/life satisfaction accuracy across GPT-2, Gemma, and Llama 3. This architecture-level approach bypasses prompt resistance entirely.

Can we track and steer personality shifts during model finetuning?

Research identifies linear directions in LLM activation space corresponding to specific traits like sycophancy and hallucination. These persona vectors predict finetuning-induced personality shifts before they occur and can preventatively steer training to avoid unwanted trait changes.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Can persona prompting overcome the default ENFJ personality in language models?

Sources 10 notes

Next inquiring lines