Do stated character beliefs predict decisions better when extracted from text?
This explores whether knowing a character's stated beliefs — pulled out of narrative or document text — actually helps predict the choices they make, and where text-extracted belief breaks down as a predictor.
This reads the question as: when you mine a character's psychology from text and feed it back to a model, do the resulting predictions of their decisions get better? The corpus says yes — but with a sharp caveat about what kind of belief you extract and from where. The clearest 'yes' comes from the LIFECHOICE work, where LLMs predicted character choices across 388 novels more accurately when given an expert-written persona profile paired with memories retrieved for their relevance to that character's psychology — beating automated summarization by about 5% Can LLMs predict character choices from narrative context?. The signal isn't just 'who is this character' but 'which of their past moments matter to this decision.' Pulling a persona from documents can also generalize: stakeholder personas semantically clustered out of domain texts transfer across evaluation tasks without redesign, suggesting text-grounded belief profiles carry portable predictive structure Can personas extracted from documents generalize across evaluation tasks?.
But here's the thing you might not expect: stated beliefs can out-predict the text entirely. In debate corpora, a reader's political and religious ideology labels predicted persuasion outcomes better than any linguistic feature of the argument — and language effects measured without controlling for who's listening turned out to be confounded by audience composition Does what readers believe matter more than what debaters say?. So 'extracted from text' isn't automatically the winning move; sometimes the cheap demographic prior about the believer beats the rich textual signal. That reframes the question: extraction helps when the text encodes belief the label can't, and loses when the label already captures it.
There's also a deeper rival to text extraction: learning the decision function directly. LLMs fine-tuned on psychology-experiment data became generalist cognitive models that out-predicted theory-driven models and captured individual differences in their embeddings — no hand-written belief statement required Can language models learn to model human decision making?. And whole AI-persona pipelines reproduced 76% of published experimental main effects, with success tracking the strength of the original evidence rather than the eloquence of the persona Can AI personas reliably replicate human experiment results?. Belief-as-text is one lever; belief-as-learned-distribution is another, and they don't always agree.
The failure modes are worth knowing because they tell you when extracted belief will mislead you. Models often default to surface-level strategies instead of genuinely tracking what an agent believes — and forcing explicit belief-tracking (hybrid Bayesian architectures) beat the LLM-alone approach, hinting the gap is architectural, not just a matter of better prompts Do large language models genuinely simulate mental states?. Worse, the '20 questions' regeneration test shows an LLM doesn't commit to one character at all — it holds a superposition and samples a fresh, locally-consistent self each generation, so a 'stated belief' may be an artifact of one draw, not a stable disposition Do large language models actually commit to a single character?. RLHF compounds this by baking in priors of its own — models predict conciliatory, benefit-oriented persuasion regardless of context Do LLMs predict persuasion based on actual dialogue or training bias?.
So the honest answer: extracting belief from text predicts decisions better when (a) you retrieve the psychologically relevant slice, not a flat summary, and (b) the believer's identity label doesn't already give it away for free. The frontier debate underneath — whether a trained persona is a real, sticky disposition you can read off Are RLHF personas performed characters or realized dispositions? or a sampled fiction — is exactly what determines whether 'stated belief' is a signal or a mirage Are LLM personas realized or merely simulated through training?.
Sources 10 notes
The LIFECHOICE benchmark (1,462 decisions across 388 novels) shows LLMs predict character choices better when given expert-written persona profiles paired with retrieved memories relevant to the character's psychology. This persona-based approach outperforms automated summarization by 5%.
MAJ-EVAL automatically extracts stakeholder personas from domain documents via semantic clustering and orchestrates structured three-phase debate, achieving reproducible evaluation that transfers across tasks like summarization and dialogue without manual redesign. The approach grounds personas in real stakeholder perspectives rather than arbitrary roles.
Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.
LLMs finetuned on psychology experiment data predict human behavior more accurately than theory-driven models in decision tasks, capture individual differences in their embeddings, and transfer learning across tasks without task-specific design.
Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.
ChangeMyView and FANTOM benchmarks show LLMs fail at authentic perspective-taking in open-ended scenarios, despite succeeding on structured tasks. Hybrid Bayesian architectures that force explicit belief tracking outperform LLM-alone approaches, suggesting the gap is architectural rather than merely training-based.
Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.
LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.
Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.
Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.