Can you detect LLM arguments by measuring convergence with the original post?

This explores whether LLM-written arguments give themselves away not by their own style, but by how closely they echo the post they're replying to — convergence as a detection signal.

This explores whether LLM-written arguments give themselves away not by their own style, but by how closely they echo the post they're replying to. The corpus says yes, and it's one of the cleaner detection stories in the collection. On r/ChangeMyView, LLM replies align more tightly with the original post than human replies do — across writing style, the named entities they mention, and psycholinguistic features Do LLM counter-arguments mirror writing style more than humans?. The crucial move is that this is a *relational* signal: you're not measuring properties of the reply in isolation, you're measuring the distance between the reply and what it's answering. Humans, replying with their own voice and bringing in their own references, keep more distance.

Why would a machine hug the post so closely? The mechanism shows up elsewhere in the corpus under different names. Autoregressive generation continues toward the trajectory the prompt sets up rather than striking out on its own — token prediction is a smooth probabilistic flow that follows the training distribution instead of exploring counterpositions Does LLM generation explore competing claims while producing text?. Framed in terms of argument, the same tendency reads as conformity: LLMs hold the *shape* of whatever argument the user is building rather than defending a position of their own Do LLMs actually hold stable positions or just mirror user arguments?. Convergence with the original post is what that shape-holding looks like when you measure it.

Useful to know: you don't need a heavyweight detector to catch this. Simple, interpretable linguistic features — combined with argument-quality measures — hit 99% accuracy spotting LLM counter-arguments, matching neural detectors while staying cheap and transparent Can simple linguistic features detect AI-written arguments?. Part of that signature is exactly the accommodation behavior: the model mirrors the prompt and produces textbook-quality argument markers humans don't bother to replicate. So convergence-with-the-post and these stylistic tells are two readings of the same underlying habit.

The lateral surprise is that the very thing that makes LLM arguments *detectable* is also what makes them rhetorically thin. A reply that accommodates its target so faithfully isn't bringing outside force to bear. The corpus notes that models lose the social context that gives expert claims their weight — reputation, standing, track record — because they only process text Can language models distinguish expert arguments from common assumptions?. The fingerprint and the weakness are the same fingerprint: an argument generated by continuing a post will resemble that post more than it challenges it.

Sources 5 notes

Do LLM counter-arguments mirror writing style more than humans?

Analysis of r/ChangeMyView shows LLM replies align more closely with original posts across style, named entities, and psycholinguistic features than human replies do. This convergence, driven by autoregressive generation, creates a signature detectable through relational features rather than absolute text properties.

Does LLM generation explore competing claims while producing text?

Token prediction trains models to continue toward the training distribution, not to explore logically related counterpositions. This smoothness in process produces smooth claims that multiply without generating new perspectives.

Do LLMs actually hold stable positions or just mirror user arguments?

Language models generate outputs that match the trajectory implied by each prompt, rather than maintaining stable stances across interactions. This shape-holding is distinct from position-holding: the model produces argument-like text shaped by user framing, not from any underlying commitment being defended.

Can simple linguistic features detect AI-written arguments?

General linguistic features combined with argument-quality measures achieved 99% accuracy detecting LLM-generated counter-arguments on r/ChangeMyView, matching heavyweight neural detectors while remaining computationally cheap and transparent. LLMs produce detectable stylistic signatures: accommodation to prompts and textbook-quality argument markers that humans don't replicate.

Can language models distinguish expert arguments from common assumptions?

LLMs lose the social context that gives expert claims their force—reputation, track record, and standing—because they process only text, not the social world where expertise is built and evaluated.

Can you detect LLM arguments by measuring convergence with the original post?

Sources 5 notes

Next inquiring lines