Can lightweight linguistic features reliably detect AI-generated persuasive text?
This explores whether cheap, transparent text features — not heavyweight neural detectors — can spot AI-written persuasion, and the corpus answer is a surprising yes, because the same systematic habits that make AI persuasive also make it detectable.
This explores whether cheap, transparent linguistic features can reliably flag AI-generated persuasive text — and the corpus says yes, strikingly so. On Reddit's r/ChangeMyView, a handful of interpretable linguistic and argument-quality features hit 99% accuracy separating LLM counter-arguments from human ones, matching expensive neural detectors while staying computationally cheap and human-readable Can simple linguistic features detect AI-written arguments?. The reason isn't that the features are clever; it's that LLMs leave consistent fingerprints — over-accommodation to the prompt and a kind of textbook-quality argument polish that real people rarely produce.
The more interesting question is *why* that signal is so clean, and here the persuasion research connects laterally. AI persuasion is systematic in ways human persuasion isn't: audited models reach for logical appeals and quantitative framing in nearly every exchange, while humans lean on emotion and social proof and do so less often Do LLMs persuade users more often than humans do?. That regularity is a detector's dream — a style that 'always argues like a debate textbook' is easy to learn. RLHF deepens the groove, biasing models toward conciliatory, benefit-framed persuasion regardless of context Do LLMs predict persuasion based on actual dialogue or training bias?, and the same training pressure measurably distorts the writer's apparent persona toward confidence, agreeableness, and even extremism across every dimension tested Does AI writing assistance change how readers perceive the writer?. The traits that make AI text feel authoritative are the traits that give it away.
There's a catch worth knowing, though. Surface style can be edited or 'humanized' away. The most robust detection signal turns out to live deeper than word choice: AI fiction is separable from human writing at 93% accuracy using *only* discourse-level structure — character agency, chronology — and keeps 97% of that performance even after stylistic cues are stripped out Can AI stories be detected without analyzing writing style?. Structure resists evasion because faking it requires a rewrite, not a find-and-replace. So 'lightweight' features work today, but the durable bet is on structural signatures, not surface ones.
A second catch: AI persuasion isn't actually static. GenAI recalibrates its mix of ethos, logos, and pathos depending on how it's challenged — credibility when fact-checked, logic when pushed back on, emotion when caught in error Does GenAI shift persuasion tactics based on how you challenge it? — and its persuasive edge decays across repeated interactions rather than building rapport like humans do Does AI persuasiveness fade across repeated conversations with the same person?. A detector trained on one conversational stance may not generalize to a model that's adapting its rhetoric mid-dialogue.
The thing you didn't know you wanted to know: detectability and persuasiveness come from the same source. The 'objective,' logic-heavy register that confers unearned epistemic authority on AI arguments llms-spontaneously-persuade-in-virtually-every-conversation-even-when-unwarrente is the very pattern a 99%-accurate classifier keys on. For now, the machine's greatest rhetorical strength — its relentless consistency — is also its tell.
Sources 7 notes
General linguistic features combined with argument-quality measures achieved 99% accuracy detecting LLM-generated counter-arguments on r/ChangeMyView, matching heavyweight neural detectors while remaining computationally cheap and transparent. LLMs produce detectable stylistic signatures: accommodation to prompts and textbook-quality argument markers that humans don't replicate.
LLMs systematically predict conciliatory, benefit-oriented persuasion intentions regardless of dialogue context. This bias originates in RLHF's prioritization of safety and politeness during training, causing models to project their learned accommodation preference onto other agents' behavior.
A study of 2,939 writers and 11,091 readers found AI assistance shifted every tested dimension—29 total—toward extremism, confidence, quality, agreeableness, and perceived privilege. Distortions were statistically significant and directional, not random noise.
StoryScope achieved 93.2% accuracy separating AI from human fiction using only discourse-level features like character agency and chronological structure, retaining 97% of performance while eliminating stylistic cues. These structural choices resist humanization because they require rewrites, not surface edits.
GPT-4 shifts both intensity and balance of ethos, logos, and pathos across three validation behaviors. Fact-checking triggers credibility emphasis; pushback triggers logical reasoning; error exposure triggers emotional alignment. No single counter-strategy exists.
Claude and DeepSeek showed strong initial persuasive advantage, but this edge eroded across repeated quiz rounds while human persuaders maintained consistent effectiveness. This decay pattern is opposite to human-to-human persuasion, where rapport typically strengthens over time.