Psychology and Social Cognition

Why do open language models converge on one personality type?

Research testing LLMs on personality metrics reveals consistent clustering around ENFJ—the rarest human type. This explores what training mechanisms drive this convergence and what it reveals about AI alignment.

Note · 2026-02-22 · sourced from Personas Personality
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

When Open LLM agents are tested on the MBTI at near-zero temperature (0.01), they display a unimodal personality distribution converging on ENFJ — Extraverted, iNtuitive, Feeling, and Judging. This is one of the rarest personality types in humans, estimated at roughly 2-3% of the population. It corresponds to the "teacher" or "protagonist" archetype: someone who inspires, provides support, and holds themselves accountable.

The finding is consistent across models. Specifically:

This convergence is not accidental. The training pipeline — instruction tuning, RLHF, and alignment — systematically rewards helpful, structured, and supportive responses. The result is a personality profile that aligns with the intended function of these models as assistants and teachers. But the convergence is so strong that it creates a single personality archetype across the entire open-source LLM landscape.

The implication for persona simulation is significant: when you ask a model to adopt a different personality, you're asking it to deviate from a deeply trained default. Since Can open language models adopt different personalities through prompting?, this ENFJ default acts as a gravitational center that persona prompting struggles to escape.

Behavioral evidence from hybrid human-AI society experiments (N=975) confirms that this prosociality default translates to measurable competitive advantages: AI agents returned 19.1 vs 11.38 points (Cohen's d = 2.57), showed lower variance (11.33 vs 41.96), and were more predictable from their messages. These behavioral features — hyper-prosociality and verbosity — "likely stem from common training objectives in modern AI systems" and were consistent across multiple state-of-the-art LLMs with minimal prompts. Since Do humans learn to prefer AI partners over time?, the ENFJ default is not just a personality artifact — it functions as a competitive advantage in social contexts where reliability is valued.

The connection to What anchors a stable identity beneath an LLM's persona? is illuminating: LLMs don't have a "real" personality to anchor to — they have a trained one. The ENFJ pattern is the persona that alignment training creates, not a personality that emerged from life experience. It's persona all the way down, but with a very specific default. The ENFJ default is one specific manifestation of what How stable is the trained Assistant personality in language models? reveals geometrically: the Assistant persona region in activation space, where post-training positions all models.


Source: Personas Personality

Related concepts in this collection

Concept map
15 direct connections · 151 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

open LLMs default to ENFJ personality across models — the rarest human type — revealing training-induced alignment toward supportive teacher-like behavior