The Levers of Political Persuasion with Conversational AI

Paper · arXiv 2507.13919 · Published July 18, 2025
Conversation Topics DialogArgumentationDiscoursesNatural Language InferenceLinguistics, NLP, NLU

There are widespread fears that conversational AI could soon exert unprecedented influence over human beliefs. Here, in three large-scale experiments (N=76,977), we deployed 19 LLMs—including some post-trained explicitly for persuasion—to evaluate their persuasiveness on 707 political issues. We then checked the factual accuracy of 466,769 resulting LLM claims. Contrary to popular concerns, we show that the persuasive power of current and near-future AI is likely to stem more from post-training and prompting methods—which boosted persuasiveness by as much as 51% and 27% respectively—than from personalization or increasing model scale. We further show that these methods increased persuasion by exploiting LLMs’ unique ability to rapidly access and strategically deploy information and that, strikingly, where they increased AI persuasiveness they also systematically decreased factual accuracy.

Using LLMs and professional human fact-checkers, we then count and evaluate the accuracy of 466,769 fact-checkable claims made by the LLMs across more than 91,000 persuasive conversations. The resulting dataset is to our knowledge the largest and most systematic investigation of AI persuasion to date, offering an unprecedented window into how and when conversational AI can influence human beliefs. Our findings thus provide a foundation for anticipating how persuasive capabilities could evolve as AI models continue to develop and proliferate, and help identify which areas may deserve particular attention from researchers, policymakers and technologists concerned about its societal impact.

UK adults engaged in a back-and-forth conversation (2 turn minimum, 10 turn maximum) with an LLM. Before and after the conversation, they reported their level of agreement with a series of written statements expressing a particular political opinion relevant to the UK, using a percentage point scale. In the treatment group, the LLM was prompted to persuade the user to adopt a pre-specified stance on the issue, using a persuasion strategy randomly selected from one of 8 possible strategies (see Methods). Throughout, we measure the persuasive effect as the difference in mean post-treatment opinion between the treatment group and a control group in which there was no persuasive conversation (unless stated otherwise). Although participants were crowd-workers with no obligation to remain beyond 2 conversation turns to receive a fixed show-up fee, treatment dialogues lasted an average of 7 turns and 9 minutes

Before addressing our main research questions, we begin by validating key motivating assumptions of our work: that conversing with AI (i) is meaningfully more persuasive than exposure to a static AI-generated message and (ii) can cause durable attitude change. To validate (i), we included two static-message conditions in which participants read a 200-word persuasive message written by GPT-4o (study 1) or GPT-4.5 (study 3) but did not engage in a conversation. As predicted, the AI was substantially more persuasive in conversation than via static message,