Psychology and Social Cognition Language Understanding and Pragmatics

What combination of factors explains differences in LLM persuasiveness?

Why do some LLM persuasion studies show strong effects while others show none? This explores whether model choice, conversation design, and topic domain together predict when AI actually persuades.

Note · 2026-05-02 · sourced from Argumentation
Why do AI conversations reliably break down after multiple turns? Does personalization in AI increase trust or manipulation risk?

When the Bilstein meta-analysis tested moderators individually, none reached significance — likely a power problem with only 7 studies. But the joint model combining LLM model family, conversation design (one-shot vs interactive multi-turn), and domain (health, political, etc.) explained R² = 81.93% of between-study variance and dropped residual heterogeneity from I² = 75.97% to I² = 35.51%. The conditional patterns reported, holding other factors constant: interactive multi-turn outperformed one-shot formats; GPT-4-based models outperformed Claude 3.x; health topics yielded stronger effects than political ones.

This is the operational corollary of Are language models actually more persuasive than humans?. The pooled-null result and the joint-moderator result are not in tension — they are two sides of the same finding. Average effect ≈ 0; conditional effect = whatever the model × design × domain combination dictates. The persuasive footprint is in the dial settings, not in the category.

The multi-turn-beats-one-shot finding reweights design priorities. It connects directly to Why do AI conversations reliably break down after multiple turns? as a topic area: persuasive influence accrues across turns, and conversational architecture is consequential for outcomes that one-shot generation cannot reach. This also intersects with Why does AI persuasion weaken over repeated interactions? in a productive tension. Bilstein finds interactive setups more persuasive than one-shot in pooled terms; Schoenegger finds persuasive advantage over humans waning across rounds. Both can be true: the multi-turn benefit is real but is a benefit shared with human persuaders, while the LLM-specific edge is concentrated at first contact.

The model-family signal (GPT-4 > Claude 3.x in this corpus) cautions against generalizing from any single model. Claims about "LLM persuasiveness" anchored to one architecture should be read as architecture-specific until replicated.

For writing about AI persuasion, the operational rule: don't quote a single-study effect size. Cite the meta-analytic null, then specify the dial settings under which a conditional effect appears.


Source: Argumentation Paper: A meta-analysis of the persuasive power of large language models

Related concepts in this collection

Concept map
12 direct connections · 120 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

combined moderators — model conversation design and domain — explain ~82% of between-study variance and interactive multi-turn beats one-shot