Persona Generators: Generating Diverse Synthetic Personas at Scale

Paper · arXiv 2602.03545 · Published February 3, 2026

Evaluating AI systems that interact with humans requires understanding their behavior across diverse user populations, but collecting representative human data is often expensive or infeasible, particularly for novel technologies or hypothetical future scenarios. Recent work in Generative Agent-Based Modeling has shown that large language models can simulate human-like synthetic personas with high fidelity, accurately reproducing the beliefs and behaviors of specific individuals. However, most approaches require detailed data about target populations and often prioritize density matching (replicating what is most probable) rather than support coverage (spanning what is possible), leaving long-tail behaviors underexplored. We introduce Persona Generators, functions that can produce diverse synthetic populations tailored to arbitrary contexts. We apply an iterative improvement loop based on AlphaEvolve, using large language models as mutation operators to refine our Persona Generator code over hundreds of iterations. The optimization process produces lightweight Persona Generators that can automatically expand small descriptions into populations of diverse synthetic personas that maximize coverage of opinions and preferences along relevant diversity axes. We demonstrate that evolved generators substantially outperform existing baselines across six diversity metrics on heldout contexts, producing populations that span rare trait combinations difficult to achieve in standard LLM outputs.

An increasingly promising alternative utilizes simulated users (Anthis et al., 2025). Recent progress in Generative Agent-Based Modeling (GABM) has made it possible to construct synthetic personas that display coherent preferences, attitudes, and behaviors (Park et al., 2023), with applications ranging from simulating complex social interactions (Park et al., 2023) to replicating behavioral patterns in economic games (Slumbers et al., 2025). Yet, most existing work in GABM focuses on algorithmic fidelity—how accurately synthetic personas reproduce observed human response patterns, often evaluated by matching the aggregate statistics or distributions derived from real data (Argyle et al., 2023; Park et al., 2024). This objective suits applications like digital twins where the target distribution is known and fidelity is essential. However, it typically requires substantial data from the population of interest and implicitly emphasizes density matching. In practice, this often collapses onto a narrow subset of stereotypical responses, failing to capture the full support of human behavior (Anthis et al., 2025).

In this paper, our goal is to create populations of personas capturing the full support of possible attitudes, preferences, and response patterns, including rarer but consequential configurations typically under-represented in LLM-generated populations. In many stress-testing settings, it is the outliers, not the average user, that drive critical failures, so leaving the long tail underexplored can create a false sense of robustness. To discover edge cases and identify safety failures, it is necessary to explore all possible users, not just the most probable ones. For example, stress-testing a mental health chatbot requires handling rare, distrustful users with severe symptoms. This focus on support becomes even more important in speculative future scenarios, such as forecasting societal adaptation to AGI, where the true distribution of human responses is simply unknown. Even when data exists, the response distributions are much narrower than the underlying data and often biased toward easily accessible WEIRD subpopulations (Western, Educated, Industrialized, Rich, Democratic) or desirable traits like high agreeableness, leaving rare but consequential behaviors underrepresented (Bisbee et al., 2024; Anthis et al., 2025; Petrov et al., 2024). Ultimately, maximizing coverage is more flexible: if the full support is covered, one can always later sample the target population to match any specific target density. However, eliciting such diversity from LLM-based personas is far from trivial. Naive prompting often leads to mode collapse, stereotypical outputs, and systematic biases, partly a byproduct of Reinforcement Learning from Human Feedback (RLHF) tuning, even when explicit instructions for diversity are provided (Santurkar et al., 2023; Li et al., 2025).

We find that simply asking an LLM to ”generate diverse personas” typically yields populations clustered around stereotypical responses, failing to cover extreme or unusual trait combinations.

This motivates a different perspective on user simulation. Rather than constructing a single fixed population of synthetic users, we propose to learn a reusable Persona Generator: a function capable of producing diverse synthetic personas on demand for any arbitrary context. This generator must be robust enough to handle the immense variety of potential scenarios that we may care about. Importantly, we aim to optimize a generally capable Persona Generator, specifically the code that samples and constructs personas, moving beyond the limitations of optimizing specific populations.

In this work, we introduce a methodology for learning Persona Generator functions that maximize diversity in traits, opinions, and preferences across arbitrary contexts. Starting from a short textual description, we first expand the context into a structured questionnaire that defines a set of diversity axes. A Persona Generator then produces a population of synthetic individuals intended to span all possible traits, opinions, and preferences defined by those axes. To overcome the difficulties of achieving diversity through standard prompting, we frame the task as an optimization problem over the Persona Generator’s code, including prompts templates and sampling logic, using an evolutionary search loop powered by AlphaEvolve (Novikov et al., 2025). The resulting Persona Generator is lightweight and efficient, enabling rapid, one-shot population synthesis for downstream applications. This separation between a costly training phase and a cheap inference phase makes Persona Generators practical for repeated use across domains, even when the original optimization context differs from the deployment setting.

• We formalize the problem of synthetic persona generation as a diversity maximization task over trait and preference embeddings, explicitly shifting the objective from algorithmic fidelity (matching specific individuals) to support coverage (spanning the space of possible traits, opinions, and preferences).

• We propose a novel Persona Generator function with a two-stage scalable architecture that separates population-level diversity decisions from per-persona background expansions, enabling both control and efficiency, coupled with a scalable pipeline that uses LLMs to generate questionnaires, simulate interactions, and evolve code.

• We demonstrate that LLM-driven evolution can discover novel Persona Generator functions that substantially outperform different baselines in coverage and diversity metrics.

We introduced Persona Generators, functions that produce diverse synthetic populations tailored to arbitrary contexts and optimized through evolutionary search. Evolved generators substantially outperform existing baselines across six diversity metrics and generalize to held-out contexts, producing populations that span rare trait combinations difficult to elicit through standard prompting. While challenges remain in measuring diversity for open-ended and interactive behaviors, our results show that optimizing the generator itself, rather than individual personas or fixed populations, is a viable and promising direction. The resulting Persona Generators are lightweight, reusable, enabling on-demand diverse population synthesis, and we plan to open-source the top-performing implementations to support further research.