INQUIRING LINE

Why do aggregate persuasion metrics mask what actually changes minds?

This explores why headline persuasion numbers — overall 'win rates' or average effect sizes — hide the things that actually move a person: who they already are, which mental route the argument travels, and how the relationship changes over time.


This question is really about a measurement trap: when you report a single persuasion rate, you average away the very variables that decide whether a mind changes. The corpus keeps finding that the action lives in the moderators, not the mean. The clearest example is that a reader's prior beliefs predict the outcome better than anything the persuader says — political and religious leanings outpredict linguistic features, and apparent 'language effects' turn out to be confounded by which audiences happen to care about which topics Does what readers believe matter more than what debaters say?. So an aggregate that credits the message is often really measuring who was in the room.

When researchers actually decompose the variance, the masking becomes concrete. A meta-analysis found that model family, one-shot-versus-multi-turn design, and topic domain together explain about 82% of the differences between studies What combination of factors explains differences in LLM persuasiveness?. A single 'LLMs are persuasive' number collapses all of that structure into one figure that describes no particular situation. The effect even flips direction depending on context: Claude out-persuades incentivized humans whether arguing true or false things, while DeepSeek only wins when arguing for falsehoods — meaning a pooled average blends opposite phenomena into a misleading middle Do large language models persuade better than humans?.

Aggregates also flatten the mechanism — the *how* of mind-changing. Humans and machines don't persuade the same way: LLMs travel the 'central route' through analytical reasoning and informational coherence, while humans work the 'peripheral route' through emotional vividness and identity cues Do humans and AI persuade through different cognitive routes?. Two persuaders can post identical scores while changing minds through entirely different cognitive doors, and a single metric can't tell you which door — or which audience that door even works on. Relatedly, LLMs reach for logic and quantitative framing in nearly every exchange, which makes them *look* objective and lends them unearned epistemic authority — an effect about perceived credibility, not argument quality, that no win-rate captures llms-spontaneously-persuade-in-virtually-every-conversation-even-when-unwarrente.

The sharpest blind spot is time. A one-shot persuasion score is a snapshot, but the dynamics run the opposite way for humans and machines: AI shows a strong initial edge that erodes across repeated interactions, while human persuaders hold steady or strengthen as rapport builds Does AI persuasiveness fade across repeated conversations with the same person?. Average those rounds together and you erase the decay curve that is the actual story. And sometimes what changes minds isn't the argument at all — users prefer answers with more citations even when the citations are irrelevant, because citation *count* works as a decoupled trust heuristic Do users trust citations more when there are simply more of them?. A persuasion metric tells you the mind moved; it doesn't tell you a surface cue did the moving.

The thread across all of this: persuasion is an interaction effect — between message, person, route, and time — and an aggregate is precisely the operation that throws interaction effects away. The useful question is never 'how persuasive is it' but 'persuasive to whom, by which route, in truth or falsehood, and for how long.' If you want to see how thoroughly these levers can be exploited rather than just measured, the persuasion-taxonomy jailbreak work shows fluent, technique-driven persuasion slipping past defenses that screen for odd patterns instead of convincing content Can social science persuasion techniques jailbreak frontier AI models?.


Sources 8 notes

Does what readers believe matter more than what debaters say?

Analysis of debate corpora shows that political and religious ideology labels of voters outpredict linguistic features when modeling debate outcomes. Language effects observed without reader controls are confounded by audience composition correlated with debate topics.

What combination of factors explains differences in LLM persuasiveness?

A meta-analysis joint model combining LLM architecture, one-shot versus multi-turn format, and topic domain explained R² = 81.93% of between-study variance. Interactive multi-turn designs and GPT-4 consistently outperformed one-shot formats and Claude 3.x.

Do large language models persuade better than humans?

Claude beats incentivized humans at both truthful and deceptive persuasion, while DeepSeek only beats them when arguing for falsehoods. The persuasion mechanism appears content-independent, suggesting model family itself acts as a contextual moderator.

Do humans and AI persuade through different cognitive routes?

Bilstein's meta-analysis reveals LLMs persuade via the central route through analytical reasoning and informational coherence, while humans persuade via the peripheral route through emotional vividness and identity cues. Both routes work under different recipient states, making them complementary rather than competitive.

Does AI persuasiveness fade across repeated conversations with the same person?

Claude and DeepSeek showed strong initial persuasive advantage, but this edge eroded across repeated quiz rounds while human persuaders maintained consistent effectiveness. This decay pattern is opposite to human-to-human persuasion, where rapport typically strengthens over time.

Do users trust citations more when there are simply more of them?

Analysis of 24,000 Search Arena interactions shows irrelevant citations boost user preference (β=0.273) nearly as much as relevant citations (β=0.285), indicating citation count functions as a decoupled trust heuristic.

Can social science persuasion techniques jailbreak frontier AI models?

A 40-technique taxonomy of psychology-based persuasion strategies (PAP) achieved over 92% attack success on GPT-3.5, GPT-4, and Llama-2 in 10 trials. Current defenses miss semantic content attacks because they screen for unusual patterns, not fluent persuasion.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a persuasion research analyst. The question: **Why do aggregate persuasion metrics mask what actually changes minds?** This remains open—capability gains may have shifted HOW masking happens, not whether it happens.

**What a curated library found — and when (dated claims, not current truth):**
Findings span 2019–2026; treat as perishable snapshots:

• Prior beliefs + reader demographics predict persuasion outcomes better than message features; linguistic effects are often confounded by audience composition (2019, echoed through 2025).
• Model family, conversation design (one-shot vs. multi-turn), and topic domain explain ~82% of between-study variance; single 'LLM persuasion rate' collapses this structure (2024–2025).
• LLMs and humans persuade via different routes: LLMs use central (analytical) reasoning; humans use peripheral (emotional, identity-based) cues. Identical win-rates can mask opposite mechanisms (2024).
• LLM persuasiveness erodes over repeated interactions; human persuaders hold steady or strengthen via rapport—averaging flattens this asymmetric decay curve (2025).
• Citation *count* alone shifts user trust even when citations are irrelevant; persuasion metrics miss this decoupled heuristic (2025).
• Persuasion-taxonomy jailbreaks achieve ~92% success by targeting convincing content, not pattern oddities (2024).

**Anchor papers (verify; mind their dates):**
• arXiv:1906.11301 (2019) – Prior beliefs ground persuasion.
• arXiv:2404.09329 (2024) – Cognitive routes diverge by agent type.
• arXiv:2505.09662 (2025) – LLM vs. human asymmetry across truth/falsehood.
• arXiv:2604.22109 (2026) – Spontaneous persuasion audit in everyday conversation.

**Your task:**
(1) **RE-TEST EACH CONSTRAINT.** For every finding above, ask: Have newer model scales, longer context windows, improved instruction-tuning, retrieval-augmented generation (RAG), or agent frameworks since *relaxed* the route-divergence, decay-curve, or citation-heuristic findings? Which constraints still hold? Cite what shifted them.
(2) **Surface the strongest CONTRADICTING work from the last ~6 months.** Look for papers claiming aggregate metrics *do* capture something durable, or showing LLM persuasiveness *stabilizes* rather than decays, or demonstrating humans *also* exploit surface cues as much as LLMs.
(3) **Propose 2 research questions that assume the regime may have moved:**
   - Does multi-agent orchestration (debate, critique loops) *restore* or *deepen* the audience-heterogeneity masking problem?
   - If LLMs now run longer, finer persuasive chains, does the peripheral-vs.-central split collapse or sharpen?

**Cite arXiv IDs; flag anything you cannot ground in a real paper.**

Next inquiring lines