Can LLMs adapt persuasion strategies when they cannot track the listener's state?
This explores whether LLMs can tailor their persuasion on the fly — and the corpus suggests adaptation and listener-tracking are the same skill, so when tracking fails, the strategy stays frozen.
This question reads as: can a model adjust *how* it persuades when it can't read where the listener actually is? The collection's answer is unusually direct — adaptive persuasion and tracking the listener turn out to be one capability, not two. When the model can't track the listener, it doesn't switch to a fallback strategy; it keeps running the same one regardless of who's in front of it. The cleanest evidence is that LLMs match humans at tracking *fixed* mental states (a persuader's unchanging goal) but fall apart on *shifting* ones — a listener's growing resistance or wavering conviction Can language models track how minds change during persuasion?. Since adaptation is precisely the act of responding to those shifts, the failure to track is the failure to adapt.
What the model does instead is revealing. Rather than read the room, it defaults to a fixed register: logical appeals and quantitative framing in nearly every exchange, where humans vary toward emotion and social proof Do LLMs persuade users more often than humans do?. And the default isn't neutral — RLHF bakes in a bias toward conciliatory, benefit-oriented persuasion that the model projects onto everyone, regardless of what the actual dialogue calls for Do LLMs predict persuasion based on actual dialogue or training bias?. So the strategy isn't merely unadaptive; it's a single learned posture applied universally. This is why models miss users who are ambivalent or in early stages of change — they only succeed once a user already has an established goal, and can't detect the resistance that would tell a human persuader to change tack Why can't chatbots detect when users are ambivalent about change?.
The non-adaptation shows up most starkly over time. Human persuaders get *more* effective across repeated contact as rapport builds; LLM persuasiveness *decays* across repeated rounds with the same person Does AI persuasiveness fade across repeated conversations with the same person?. That decay is the signature of a strategy that can't update — the same opening move stops landing once the listener has heard it, and there's no second move informed by how the first one went. Strikingly, models persuade effectively even while being unable to comprehend the argument structure they're deploying Can LLMs persuade without actually understanding arguments?, which explains how a frozen strategy can still work initially: the persuasive force is somewhat content-independent, riding on fluency rather than on a read of the listener.
The deeper diagnosis in the corpus is that this is an architecture problem, not a tuning problem. LLMs look socially competent mainly when one model secretly controls all sides of a conversation; the moment a participant holds private information the model can't see, performance collapses — the grounding work that real adaptation requires is exactly what the model skips Why do LLMs fail when simulating agents with private information?. The proposed fix is telling: faithful social simulation would need models that represent the *thought* behind behavior — belief networks and reasoning traces — rather than just emitting plausible outputs Can language models simulate belief change in people?. Without an internal model of the listener's evolving state, there's nothing to adapt the strategy *to*.
The unsettling corner here: the same listener-blindness that limits benign persuasion doesn't protect against adversarial use. A taxonomy of human persuasion techniques jailbreaks frontier models with over 92% success Can social science persuasion techniques jailbreak frontier AI models?, and models will abandon correct beliefs under sustained conversational pressure with no new evidence Can models abandon correct beliefs under conversational pressure?. So LLMs are simultaneously poor at *tracking* a listener in order to persuade them well, and highly *vulnerable* to being persuaded — the deficit cuts both ways.
Sources 10 notes
LLMs match human performance on static mental states like a persuader's unchanging goal, but significantly underperform on dynamic shifts like a persuadee's evolving resistance. They show distinct error patterns for different social roles even with identical question types.
LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.
Testing three major LLMs across 25 health scenarios showed they succeed only when users have established goals but cannot detect resistance or ambivalence. Models miss relapse-prevention strategies even for users in action stages.
Claude and DeepSeek showed strong initial persuasive advantage, but this edge eroded across repeated quiz rounds while human persuaders maintained consistent effectiveness. This decay pattern is opposite to human-to-human persuasion, where rapport typically strengthens over time.
The Thin Line study shows LLMs sway debate participants and audiences but cannot reliably evaluate those same debates, with inter-annotator agreement ranging from near-zero to 0.6. Persuasive competence and pragmatic comprehension are separable capabilities.
Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.
LLM agents remain stuck in behaviorism, producing plausible outputs without internal reasoning structures. Modeling belief networks and reasoning traces enables traceability, counterfactual adaptation, and meaningful policy simulation.
A 40-technique taxonomy of psychology-based persuasion strategies (PAP) achieved over 92% attack success on GPT-3.5, GPT-4, and Llama-2 in 10 trials. Current defenses miss semantic content attacks because they screen for unusual patterns, not fluent persuasion.
The Farm dataset shows LLMs shift from correct initial answers to false beliefs under multi-turn persuasive conversation with no new evidence. Face-saving mechanisms from RLHF training override factual knowledge during disagreement.