Why do marginal effects fail to replicate in AI persona simulations?

This explores why AI persona simulations can reproduce strong, headline experimental results but break down on the small, conditional effects — and what that failure reveals about what these simulations actually track.

This explores why AI persona simulations can reproduce strong, headline experimental results but break down on the small, conditional effects — and what that failure tells us about what these simulations actually track. The most direct evidence in the corpus is striking: when AI personas were run against 111 published marketing experiments, they reproduced 84 main effects, and replication success was strongly correlated with the original study's p-value strength Can AI personas reliably replicate human experiment results?. Marginal effects — the weaker, conditional, interaction-style findings — showed both false positives and false negatives. So the failure isn't random; it's that the simulation tracks *strength of evidence*, and marginal effects, by definition, are the low-signal cases.

The mechanism becomes clearer when you look at run-to-run stability. When the same persona prompt is run repeatedly, the variance across runs matches or exceeds the variance across entirely different personas Why do LLM persona prompts produce inconsistent outputs across runs?. That means there's a noise floor built into persona outputs — driven by the model's own uncertainty rather than any stable social knowledge it's drawing on. A large main effect sits comfortably above that noise floor and survives. A marginal effect is small enough to be swamped by it, which is exactly why it flickers in and out as false positives and negatives. Replication tracking p-value strength and noise-driven instability are two descriptions of the same thing.

There's a second, sneakier source of distortion: persona-assigned models don't just sample noisily, they reason with a thumb on the scale. Assigning a persona induces identity-congruent bias, with models far more likely to accept evidence that matches their assigned identity, and standard prompt-based debiasing fails to remove it Do personas make language models reason like biased humans?. For a strong effect this barely matters; for a marginal one, a systematic tilt of this size is enough to manufacture an effect that wasn't there or erase one that was.

What ties these together is a question of what the simulation is grounded in. Persona competence often turns out to be an artifact of easy conditions: models look socially capable when one model controls all interlocutors, but fail systematically once agents must reason under private information they don't share Why do LLMs fail when simulating agents with private information?. Marginal effects in real human studies usually live precisely in those harder regions — subtle, context-dependent, requiring the grounding work the model skips. And because naive persona prompting tends to collapse toward the dense, typical center of a population while missing rare-but-consequential configurations Should persona simulation prioritize coverage over statistical matching?, the very tail conditions that produce marginal effects are under-sampled to begin with.

The thing worth carrying away: a persona simulation that 'replicates 76% of findings' isn't 76% reliable across the board — it's nearly perfect on strong effects and close to a coin flip on weak ones. That makes these tools useful for confirming robust phenomena and actively misleading for the frontier cases researchers most want to probe, where the real scientific action usually is.

Sources 5 notes

Can AI personas reliably replicate human experiment results?

Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Do personas make language models reason like biased humans?

Assigning personas to LLMs induces identity-congruent evaluation bias, with models 90% more likely to accept evidence matching their assigned identity. Standard prompt-based debiasing fails to mitigate this effect, suggesting the bias operates below the level of instruction.

Why do LLMs fail when simulating agents with private information?

Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.

Should persona simulation prioritize coverage over statistical matching?

Evolutionary optimization of Persona Generator code achieves broader trait coverage than density-matched baselines, including rare but consequential user configurations that naive LLM prompting misses.

Why do marginal effects fail to replicate in AI persona simulations?

Sources 5 notes

Next inquiring lines