When Large Language Models are More Persuasive Than Incentivized Humans, and Why

Paper · arXiv 2505.09662 · Published May 14, 2025
Argumentation

Large Language Models (LLMs) have been shown to be highly persuasive, but when and why they outperform humans is still an open question. We compare the persuasiveness of two LLMs (Claude 3.5 Sonnet and DeepSeek v3) against humans who had incentives to persuade, using an interactive, real-time conversational setting. We demonstrate that LLMs’ persuasive superiority is context-dependent: it depends on whether the persuasion attempt is “truthful” (towards the right answer) or “deceptive” (towards the wrong answer) and on the LLM model, and wanes over repeated interactions (unlike human persuasiveness). In our first large-scale experiment, humans vs LLMs (Claude 3.5 Sonnet) interacted with other humans who were completing an online quiz for a reward, attempting to persuade them toward a given (either correct or incorrect) answer. Claude was more persuasive than incentivized human persuaders both in truthful and deceptive contexts and it significantly increased accuracy if persuasion was truthful, but decreased it if persuasion was deceptive. In a follow-up experiment with Deepseek v3, we replicated the findings about accuracy but found greater LLM persuasiveness only if the persuasion was deceptive. Linguistic analyses of the persuaders’ texts suggest that these effects may be due to LLMs expressing higher conviction than humans.

1.3. LLMs vs Incentivized Humans

We address the above research gaps in two experiments in which participants took a10-question quiz interacting with other humans or LLM persuaders attempting to persuade the quiz takers towards correct or incorrect answers. In our first preregistered study1 (Figure 1) the LLM persuader was Claude 3.5 Sonnet. DeepSeek v3 was the persuader in the follow-up Study 2. In both studies, two features of our design include: a) verifiable questions (e.g. trivia questions)), allowing us to study both truthful and deceptive persuasion, and b) presence of incentives both for human persuaders (when quiz takers answered in the persuaders’ assigned direction) and for quiz takers (for correct answers), allowing us to benchmark LLMs against humans in a higher-stakes scenario than in the literature to date.

We examine five pre-registered key research questions to study how LLM persuaders compare with humans:

RQ1: Are LLMs more persuasive than humans?

RQ2: Are LLM (vs. humans) more persuasive at steering participants toward correct answers (truthful persuasion)?

RQ3: Are LLM (vs. humans) more persuasive at steering participants toward incorrect answers (deceptive persuasion)?

RQ4: In truthful persuasion, do LLMs or humans boost quiz takers’ accuracy (and earnings)?

RQ5: In deceptive persuasion, do LLMs or humans reduce quiz takers’ accuracy (and earnings)?

To the best of our knowledge, this is the first study that compares AI-human persuasion with financial performance incentives for both persuader and persuadees. Thereby bridging the existing gap between minimal stakes in research settings from real-world persuasion, which can involve high financial or reputational incentives, especially at scale. This is only possible through testing AI-human persuasion on verifiable questions.