Language Understanding and Reasoning

Can simple linguistic features detect AI-written arguments?

Can interpretable linguistic patterns reliably distinguish LLM-generated counter-arguments from human-written ones in persuasive contexts? This matters because simple, auditable detection might outperform expensive neural approaches.

Note · 2026-05-18 · sourced from Argumentation
How do people build trust with conversational AI? Where exactly do LLMs break down with language structure?

A combination of general-purpose linguistic features (lexical richness, syntactic complexity, type-token ratios) and argument-quality features (logical soundness, justification, engagement strategy) detects LLM-generated counter-arguments on r/ChangeMyView with nearly 99% accuracy. The features are interpretable — they name what they detect — and the detector is computationally cheap. External benchmark tests show this lightweight method performs comparably to heavyweight neural detectors in generalized detection scenarios.

The methodological point matters more than the accuracy number. Detection research has trended toward black-box classifiers — fine-tuned transformers that produce a yes/no without an explanation. The CMV result is the inverse: pick the right interpretable features and you get equivalent performance for a fraction of the compute, with the audit trail built in. The features are what does the work; the classifier is a wrapper.

The detection holds for one specific context — persuasive counter-arguments on CMV — and the authors are careful to flag the open questions: how does prompt design affect detectability, how does task type interact with the feature signature, how do these features behave under adversarial paraphrase. The 99% number is a ceiling for a specific genre, not a universal claim about LLM detection.

The forensic implication is the durable part. As long as LLM production mechanisms differ structurally from human production — stylistic mirroring of prompts, higher emotional positivity, textbook-quality argument markers — interpretable feature-based detection will find a target. Robust evasion would require LLMs to produce text whose features are human-like, not merely text whose content is convincing. That is a much harder optimization problem than current LLM training optimizes for.

Related concepts in this collection

Concept map
12 direct connections · 99 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

lightweight interpretable linguistic features achieve 99 percent accuracy detecting LLM-generated counter-arguments in persuasive discourse