Psychology and Social Cognition Language Understanding and Pragmatics

Can personas extracted from documents generalize across evaluation tasks?

This explores whether automating persona creation from domain documents—rather than hand-crafting roles—enables multi-agent evaluators to transfer across different tasks without redesign. The question matters because manual personas fail to generalize across domains.

Note · 2026-02-23 · sourced from Agents Multi

Multi-agent evaluation frameworks like ChatEval assign agents to pre-defined roles ("general public," "critic") and manually craft evaluation dimensions. This works for one task but fails to generalize: a "critic" in summarization may not carry the same evaluative priorities to dialogue generation. MAJ-EVAL (2025) addresses this by automating the entire persona creation pipeline from domain documents.

The process has two steps. First, evaluative dimension extraction: given domain-specific documents (e.g., research papers), the system identifies stakeholders (parents, clinicians, educators) and their associated perspectives, priorities, and evaluation criteria — with evidence chains linking dimensions to specific claims in the source documents. Semantically similar stakeholders are clustered and redundant dimensions merged, preserving diversity within groups.

Second, dimension-based persona construction: for each consolidated dimension, a detailed persona is constructed with five attributes — demographic information, evaluative dimension, domain specialty, psychological traits, and social relationships. These personas ground the evaluation agents in real stakeholder perspectives rather than arbitrary role assignments.

The evaluation itself runs in three phases: (1) individual agent assessment from unique perspectives, (2) multi-agent in-group free debate moderated by a coordinating agent that prioritizes unresolved disagreements, and (3) aggregation across groups combining qualitative synthesis with quantitative score averaging. This mirrors how real stakeholder groups deliberate — initial positions → debate → consensus.

The key advantage is reproducibility and transferability. Because personas are extracted from documents rather than hand-crafted, the same pipeline applies to children's storybook QA and medical literature summarization without redesign. Since How do we generate realistic personas at population scale?, the document-grounded approach provides the calibration anchor that ad hoc persona generation lacks.

Source: Agents Multi

Related concepts in this collection

How do we generate realistic personas at population scale? Current LLM-based persona generation relies on ad hoc methods that fail to capture real-world population distributions. The challenge is reconstructing the joint correlations between demographic, psychographic, and behavioral attributes from fragmented data.
the calibration problem document-grounded personas address
Can LLM judges be fooled by fake credentials and formatting? Explores whether language models evaluating text fall for authority signals and visual presentation unrelated to actual content quality, and whether these weaknesses can be exploited without deep model knowledge.
multi-agent debate with diverse personas as structural bias mitigation
Can AI agents learn people better from interviews than surveys? Can rich interview transcripts seed more accurate generative agents than demographic data or survey responses? This matters because it challenges how we build digital simulations of real people.
content depth matters for persona quality
Can AI systems detect when they've genuinely reached agreement? When multiple AI agents debate, they often converge without actually deliberating. Can a dedicated agent reliably identify true agreement versus false consensus, and would that improve debate outcomes?
structured debate mechanisms for evaluation

Concept map

14 direct connections · 129 in 2-hop network ·dense cluster

Can personas extracted from documents generalize… How do we generate realistic personas at populatio… Can LLM judges be fooled by fake credentials and f… Can AI agents learn people better from interviews … Can AI systems detect when they've genuinely reach…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

automated stakeholder persona extraction from domain documents enables cross-task generalizable multi-agent evaluation