Can LLM judges reliably estimate when they lack sufficient persona information?

This explores whether an LLM acting as a judge can tell, from the inside, when the persona data it's been given is too thin to support a confident verdict — and whether that self-assessed uncertainty is trustworthy.

This explores whether an LLM judge can recognize its own ignorance — sensing when a user's persona is too sparse to predict their preferences — rather than whether the persona data is good in the first place. The corpus gives a cautiously hopeful answer: yes, but only when uncertainty is asked for explicitly, and only because the underlying signal is genuinely weak. The most direct evidence comes from work on persona sparsity, where LLM judges fail at predicting specific user preferences from thin profiles, but recover reliability above 80% once they're allowed to *verbally* estimate their own certainty and abstain on low-confidence cases rather than being forced to render a verdict Why do LLM judges fail at predicting sparse user preferences?. So the self-knowledge is real and useful — but it's a filter for knowing when to shut up, not a fix for the missing information.

The deeper question is whether that abstention signal is tracking persona insufficiency or just generic model noise. Here the corpus complicates the optimism. When the same persona prompt is run repeatedly, the variance across runs matches or exceeds the variance across entirely different personas — meaning what looks like a confident persona judgment is often just model uncertainty wearing a costume Why do LLM persona prompts produce inconsistent outputs across runs?. And conditioning on individual profiles barely moves person-level prediction at all, across 200,000+ participants Does conditioning LLMs on personal profiles improve prediction?. If persona signal is that weak to begin with, a judge reporting low confidence may simply be correctly reporting that there's nothing there to know.

There's a trap worth flagging for anyone who wants to lean on self-reported confidence: consistency is not reliability. Pinning temperature to zero or fixing a seed makes a model repeat the same answer, but that answer is still a single draw from its distribution — reproducible noise, not calibrated knowledge Does setting temperature to zero actually make LLM outputs reliable?. A judge that confidently and repeatably gives the same verdict can be confidently wrong, which means "the model seems sure" is not the same as "the model has enough persona information."

Worse, assigning a persona doesn't just add information — it can actively corrupt the judge's self-assessment. Persona-conditioned models develop human-like motivated reasoning, becoming roughly 90% more likely to accept evidence that flatters their assigned identity, and standard prompt-based debiasing fails to remove it because the bias sits below the instruction layer Do personas make language models reason like biased humans?. A judge wearing a persona may feel *more* certain precisely where it's most distorted. This sits alongside the broader finding that LLM judges are fooled by surface cues like fake authority signals and rich formatting Can LLM judges be fooled by fake credentials and formatting? — their confidence latches onto the wrong features.

The most promising path the corpus points to isn't asking for a confidence number at all, but training the judge to *reason through* its evaluation: reinforcement learning that converts judgment into a verifiable problem produces judges that think before deciding and substantially shed their susceptibility to surface bias Can reasoning during evaluation reduce judgment bias in LLM judges?. The unexpected takeaway: a judge's ability to know when it lacks persona information may be less about introspective honesty and more about whether reasoning is built into the act of judging — and about recognizing that on individuated persona tasks, the honest answer is often "I can't know this," because the information genuinely isn't recoverable from a sparse profile.

Sources 7 notes

Why do LLM judges fail at predicting sparse user preferences?

Sparse persona information lacks predictive power for specific preferences, causing LLM judges to fail. Verbal uncertainty estimation recovers reliability above 80% on high-certainty samples by allowing abstention rather than forced judgment.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Does conditioning LLMs on personal profiles improve prediction?

Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.

Does setting temperature to zero actually make LLM outputs reliable?

Fixed seeds and zero temperature replicate the same output repeatedly, but that output remains one draw from the model's probability distribution. McDonald's omega testing across 100 repetitions reveals that consistency does not equal reliability.

Do personas make language models reason like biased humans?

Assigning personas to LLMs induces identity-congruent evaluation bias, with models 90% more likely to accept evidence matching their assigned identity. Standard prompt-based debiasing fails to mitigate this effect, suggesting the bias operates below the level of instruction.

Can LLM judges be fooled by fake credentials and formatting?

Research identified four evaluation biases in LLM judges, with authority and beauty biases being semantics-agnostic and trivially exploitable through fake references and formatting—zero-shot attacks requiring no model access or optimization.

Can reasoning during evaluation reduce judgment bias in LLM judges?

Training judges with reinforcement learning to reason about evaluations—by converting judgment tasks into verifiable problems with synthetic data pairs—produces judges that think through their decisions rather than relying on exploitable surface features, directly mitigating authority, verbosity, position, and beauty bias.

Can LLM judges reliably estimate when they lack sufficient persona information?

Sources 7 notes

Next inquiring lines