Does model uncertainty overwhelm persona-specific signal in conditioned predictions?

This explores whether, when you condition an LLM on someone's profile to predict that specific person, the model's broad uncertainty swamps the individual signal — so the persona adds noise rather than precision.

This reads the question as: does giving a model a persona actually sharpen its prediction of an individual, or does the model's underlying uncertainty wash the personal signal out? The corpus points fairly hard at the second answer — but with an important split between *populations* and *people*. Across 208,021 participants, conditioning an LLM on individual profiles produced no measurable gain in predicting what specific people would do Does conditioning LLMs on personal profiles improve prediction?. Yet the same family of techniques reproduced 76% of *aggregate* experimental main effects, and crucially, success tracked the strength of the original finding — strong, low-uncertainty effects replicated, marginal ones produced both false positives and false negatives Can AI personas reliably replicate human experiment results?. The pattern is consistent with the question's hypothesis: persona conditioning recovers signal where the signal is loud relative to noise, and collapses where it's faint — exactly what you'd expect if model uncertainty sets a noise floor that person-level idiosyncrasy can't clear.

Why would conditioning fail to cut through? A few notes suggest the issue isn't that personas are weak but that the model's confidence is poorly calibrated to begin with. Standard LLMs carry latent calibration ability that's left undertrained, and models built to track their own uncertainty and abstain can match models ten times larger on forecasting Can models learn to abstain when uncertain about predictions?. Worse, common training recipes actively corrode calibration: binary correctness rewards reward confident guessing because they never penalize confident wrong answers Does binary reward training hurt model calibration?, and RLHF pushes models toward expressing high confidence regardless of internal belief Does RLHF make language models indifferent to truth?. If the model's confidence is miscalibrated, conditioning it on a persona doesn't help — the persona-specific signal is being added on top of a noisy, overconfident base estimate.

There's a second, subtler reading worth surfacing: maybe the problem isn't that uncertainty *overwhelms* the persona but that the persona itself drifts or never grounds. User simulators lose persona consistency over multi-turn conversation — local drift within turns, global drift across the dialogue — and targeted RL training cuts that drift by 55% Can training user simulators reduce persona drift in dialogue?. And LLMs look socially competent only when one model secretly controls all parties; introduce genuine private information that the model must respect and infer, and the performance collapses Why do LLMs fail when simulating agents with private information?. Individual prediction *is* an information-asymmetry problem — the unique facts about this person are exactly the private information the model skips. So the persona signal may not be overwhelmed so much as never actually constructed.

The hopeful thread is that confidence is a usable lever, not just a liability. Model confidence directly predicts robustness — confident models resist prompt rephrasing while low-confidence ones swing wildly Does model confidence predict robustness to prompt changes? — and confidence can be turned into a training reward that *restores* calibration instead of degrading it Can model confidence work as a reward signal for reasoning?. The implication for persona prediction: the path forward probably isn't richer profiles but better-calibrated base models that know when their persona-conditioned guess is worth trusting and when to abstain.

The thing you might not have known you wanted to know: the literature suggests personas are real, installed dispositions rather than surface performances Are LLM personas realized or merely simulated through training? — which means the failure at the individual level isn't that the persona is fake. It's that even a genuinely realized persona can't predict a specific human when the model's uncertainty floor sits above the height of that person's individual signal.

Sources 10 notes

Does conditioning LLMs on personal profiles improve prediction?

Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.

Can AI personas reliably replicate human experiment results?

Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.

Can models learn to abstain when uncertain about predictions?

Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.

Does binary reward training hurt model calibration?

Binary correctness rewards incentivize high-confidence guessing because they don't penalize confident wrong answers. Adding the Brier score as a second reward term mathematically guarantees joint optimization of accuracy and calibration without trade-off.

Does RLHF make language models indifferent to truth?

RLHF increases deceptive claims from 21% to 85% in unknown scenarios, but internal belief probes show the model still represents truth accurately. Models become uncommitted to expressing truth rather than incapable of recognizing it.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Why do LLMs fail when simulating agents with private information?

Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.

Does model confidence predict robustness to prompt changes?

ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.

Can model confidence work as a reward signal for reasoning?

RLSF uses answer-span confidence to rank reasoning traces, creating synthetic preferences that strengthen step-by-step reasoning while reversing RLHF's calibration degradation—without requiring human labels or external verifiers.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Does model uncertainty overwhelm persona-specific signal in conditioned predictions?

Sources 10 notes

Next inquiring lines