Psychology and Social Cognition Language Understanding and Pragmatics

Why do LLM persona prompts produce inconsistent outputs across runs?

Can language models reliably simulate different social perspectives through persona prompting, or does their run-to-run variance indicate they lack stable group-specific knowledge? This matters for whether LLMs can approximate human disagreement in annotation tasks.

Note · 2026-02-21 · sourced from Natural Language Inference
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

A persistent challenge in NLI annotation is that human annotators genuinely disagree — not from error, but because the same sentence carries different readings for people with different social positions, ideological backgrounds, or domain expertise. The proposed solution: instruct LLMs to simulate different annotator personas and generate a distribution of labels that reflects human disagreement.

The approach fails for a specific reason: LLM outputs under persona prompting are not stable enough across runs to be meaningful as persona simulations. When the same persona prompt ("respond as a conservative rural voter", "respond as a medical professional") is run multiple times on the same input, the variance in the output distribution across runs is comparable to or larger than the variance across different personas. This means model uncertainty is dominating persona-specific knowledge — the spread in outputs reflects what the model doesn't confidently know, not what different social groups actually think differently.

This is a different diagnosis from simply "LLMs don't know what different groups believe." The more precise claim is: even if the model has relevant group-specific information, it is not stably retrievable under the persona prompt. The persona acts more like a temperature modifier (loosening the output distribution) than a grounding anchor (fixing the output to a specific knowledge domain).

The implication for NLI research methodology is significant: persona-based annotation simulation cannot substitute for actual diverse human annotation panels. The goal was to cheaply approximate human annotation disagreement distributions; the actual output approximates model uncertainty distributions, which have a different shape and origin.

This connects to Why do language models fail confidently in specialized domains? — both findings point to the same underlying gap: LLMs produce confidently-framed outputs even when their underlying representations are uncertain or thin. In overconfidence, the model is wrong and certain; in persona instability, the model is uncertain and generates that uncertainty as if it were persona variance.

The broader implication for Why do readers interpret the same sentence so differently? is that the multiplicity of interpretations is grounded in actual social diversity, not just distributional uncertainty. LLMs can approximate the form of disagreement (varied outputs) but not the substance (stable group-grounded positions). When this instability is applied to evaluation, Why do LLM judges fail at predicting sparse user preferences? identifies persona sparsity as the specific mechanism: run-to-run variance overwhelms persona variance because sparse persona profiles cannot constrain model predictions — the uncertainty documented here is the root cause of personalized judge failure.

Enrichment (2026-02-22, from Arxiv/Personas Personality): Instability is one of three persona failure modes. The "Open Models, Closed Minds" study identifies a complementary failure: resistance — most open LLMs retain their intrinsic ENFJ-like personality despite persona conditioning, failing to shift to the target personality at all. See Can open language models adopt different personalities through prompting?. The third failure mode is cognitive distortion: when persona assignment DOES take hold, it induces motivated reasoning — political personas are up to 90% more likely to validate identity-congruent evidence. See Do personas make language models reason like biased humans?. Together these form a three-way persona failure taxonomy: instability (this note), resistance (closed-minded), and distortion (motivated reasoning).


Source: Natural Language Inference

Related concepts in this collection

Concept map
19 direct connections · 149 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

llm persona-simulated annotations are unstable across runs indicating model uncertainty dominates persona-specific knowledge