How much does interview richness matter compared to model capability for persona accuracy?
This explores whether the *input* that defines a persona — like a rich interview transcript — matters more for getting that persona right than the raw power of the underlying model.
This explores whether the *input* that defines a persona — the richness of an interview, a profile, a transcript — matters more for accuracy than the raw capability of the model running it. The corpus points, somewhat surprisingly, in one clear direction: what you feed the persona matters far more than how powerful the model is.
The most direct evidence comes from interview-based agents. When researchers built agents from two-hour voice interviews with over a thousand real people, those agents replicated participants' own responses about 85% as accurately as the people replicated themselves — and the driver was *factual content*, not linguistic style. Even stripped down to summary bullet points, the agents kept 83% fidelity Can AI agents learn people better from interviews than surveys?. In other words, the substance the interview captured did the work, not surface mimicry.
Now set that against model capability. Jumping from GPT-3.5 to Claude 3.5 Sonnet — an enormous leap in general ability — bought only a 2.97% gain in persona consistency. The takeaway is that persona adherence is largely *orthogonal* to model scaling, because standard training optimizes per-turn quality, not staying-in-character across a conversation Does model capability translate to better persona consistency?. And simply handing a capable model someone's profile and asking it to predict that individual? Across 200,000+ participants, conditioning on personal profiles produced no measurable improvement in person-level prediction Does conditioning LLMs on personal profiles improve prediction?. Capability alone doesn't rescue a thin input.
What *does* lift accuracy is richer, better-structured input. Realistic synthetic dialogue needs three multiplicative layers working together — subtopic specificity, personality variation, and contextual detail — to recover 90% of real-dialogue performance Can synthetic dialogues become realistic through layered diversity?. And persona fidelity can't be optimized in isolation: chasing persona-consistency scores alone leads models to just copy character descriptions while ignoring what's actually being asked, so persona and context have to be tuned jointly Do persona consistency metrics actually measure dialogue quality?. Even where AI personas succeed at scale — replicating 76% of published experimental effects — the wins track the *strength of the underlying signal* (effects with stronger original evidence replicate more reliably), not model horsepower Can AI personas reliably replicate human experiment results?.
The thing worth knowing you wanted to know: persona accuracy looks less like a problem you solve with a smarter model and more like a problem you solve with a richer recording of the actual person — and approaches like test-time personas that keep learning from real interaction data lean into exactly that Can personas evolve in real time to match what users actually want?. If you want a sharper persona, the marginal hour is better spent on the interview than on the model upgrade.
Sources 7 notes
A 1,052-person study found agents built from voice interviews replicated participant responses nearly as well as people replicate their own answers. Factual content, not linguistic style, drove this accuracy—even summary bullet points retained 83% fidelity.
Claude 3.5 Sonnet achieved only 2.97% improvement over GPT 3.5 on persona consistency despite massive capability gaps, suggesting persona adherence is orthogonal to model scaling. Standard training objectives optimize for per-turn quality, not cross-turn coherence.
Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.
Research shows that realistic synthetic dialogues require three multiplicative layers: subtopic specificity, Big Five persona variation, and 11 contextual characteristics via Chain of Thought reasoning. This structured approach captures 90.48% of in-domain dialogue performance.
High persona adherence scores often come from copying character descriptions while ignoring query relevance. MUDI jointly optimizes both by using discourse relations and graph-based coherence modeling alongside persona fidelity, showing that persona and context must be optimized together, not separately.
Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.
PersonaAgent uses structured personas to bridge episodic/semantic memory and personalized actions, optimizing them at test time by simulating recent interactions against textual feedback. Learned personas cluster meaningfully in latent space, suggesting genuine user-specific separation beyond standard post-training drift.