Do LLM counter-arguments mirror writing style more than humans?
When language models generate arguments against social media posts, do they unconsciously adopt the stylistic features of what they're arguing against? This matters because it could reveal a detectable pattern that distinguishes LLM-written rebuttals from human-written ones.
When LLMs generate counter-arguments on r/ChangeMyView, they unintentionally produce a signature: their replies converge stylistically with the original post they are replying to — substantially more than humans do. The convergence shows up across named entities, psycholinguistic features, and argument quality markers. Human replies remain stylistically more independent of the post's wording.
This is mechanically interesting because it inverts the intuitive picture of LLM persuasion. The naive expectation is that LLMs produce a stable "house voice" regardless of input. The data shows the opposite: LLMs are more contextually mirroring than humans, not less. The mechanism is plausibly attention-driven — autoregressive generation conditioned on the prompt drags style toward the prompt — but the social-theoretic framing is more useful: this looks like the structural form of communication accommodation, without the social motivation that drives humans to mirror selectively.
The detection consequence is direct. If you want to know whether a counter-argument was written by a model, the relational feature (how the reply resembles the post) is more informative than any absolute feature of the reply itself. Standard detection setups treat each text as an independent sample; this study suggests pairing the reply with its provocation and measuring convergence is the cleaner signal.
The social-theoretic question this opens: humans accommodate selectively — they mirror friends and people they want to align with, and resist mirroring opponents. LLMs mirror unconditionally. This means an LLM replying to a post it is arguing against will still produce stylistic convergence with that post — which would be socially incoherent if a human did it. The convergence is therefore not communicative accommodation in the social sense; it is a structural artifact masquerading as one.
Related concepts in this collection
-
Do LLMs and humans persuade through the same mechanisms?
If LLM and human arguments achieve equal persuasive force, does that mean they work the same way? This explores whether equivalent outcomes hide fundamentally different rhetorical strategies.
extends the equivalence-with-divergent-mechanisms picture: stylistic mirroring is part of how LLMs achieve equivalent persuasive force
-
Can simple linguistic features detect AI-written arguments?
Can interpretable linguistic patterns reliably distinguish LLM-generated counter-arguments from human-written ones in persuasive contexts? This matters because simple, auditable detection might outperform expensive neural approaches.
the same paper's detection result depends partly on this convergence signal
-
Do LLMs use moral language more than humans?
This explores whether large language models rely more heavily on appeals to care, fairness, authority, and sanctity than human arguers do, and whether this difference persists when emotional tone remains equivalent.
a different production signature also separating LLM and human persuasion
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
LLM counter-arguments converge stylistically with the post they reply to — humans don't mirror creating a detectable accommodation signature