Language Understanding and Reasoning

Do LLM counter-arguments mirror writing style more than humans?

When language models generate arguments against social media posts, do they unconsciously adopt the stylistic features of what they're arguing against? This matters because it could reveal a detectable pattern that distinguishes LLM-written rebuttals from human-written ones.

Note · 2026-05-18 · sourced from Argumentation
Where exactly do LLMs break down with language structure? Why do LLMs fail at understanding what remains unsaid?

When LLMs generate counter-arguments on r/ChangeMyView, they unintentionally produce a signature: their replies converge stylistically with the original post they are replying to — substantially more than humans do. The convergence shows up across named entities, psycholinguistic features, and argument quality markers. Human replies remain stylistically more independent of the post's wording.

This is mechanically interesting because it inverts the intuitive picture of LLM persuasion. The naive expectation is that LLMs produce a stable "house voice" regardless of input. The data shows the opposite: LLMs are more contextually mirroring than humans, not less. The mechanism is plausibly attention-driven — autoregressive generation conditioned on the prompt drags style toward the prompt — but the social-theoretic framing is more useful: this looks like the structural form of communication accommodation, without the social motivation that drives humans to mirror selectively.

The detection consequence is direct. If you want to know whether a counter-argument was written by a model, the relational feature (how the reply resembles the post) is more informative than any absolute feature of the reply itself. Standard detection setups treat each text as an independent sample; this study suggests pairing the reply with its provocation and measuring convergence is the cleaner signal.

The social-theoretic question this opens: humans accommodate selectively — they mirror friends and people they want to align with, and resist mirroring opponents. LLMs mirror unconditionally. This means an LLM replying to a post it is arguing against will still produce stylistic convergence with that post — which would be socially incoherent if a human did it. The convergence is therefore not communicative accommodation in the social sense; it is a structural artifact masquerading as one.

Related concepts in this collection

Concept map
15 direct connections · 91 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

LLM counter-arguments converge stylistically with the post they reply to — humans don't mirror creating a detectable accommodation signature