AI Argues Differently: Distinct Argumentative and Linguistic Patterns of LLMs in Persuasive Contexts
Distinguishing LLM-generated text from human-written is a key challenge for safe and ethical NLP, particularly in high-stake settings such as persuasive online discourse. While recent work focuses on detection, real-world use cases also demand interpretable tools to help humans understand and distinguish LLM-generated texts. To this end, we present an analysis framework comparing human- and LLM-authored arguments using two easily-interpretable feature sets: general-purpose linguistic features (e.g., lexical richness, syntactic complexity) and domain-specific features related to argument quality (e.g., logical soundness, engagement strategies). Applied to /r/ChangeMyView arguments by humans and three LLMs, our method reveals clear patterns: LLM-generated counter-arguments show lower type-token and lemma-token ratios but higher emotional intensity – particularly in anticipation and trust. They more closely resemble textbook-quality arguments – cogent, justified, explicitly respectful toward others, and positive in tone. Moreover, counter-arguments generated by LLMs converge more closely with the original post's style and quality than those written by humans.
Analyzing distributional differences between human and LLM arguments in persuasive discourse, we find substantial differences both in style and argument quality: LLM arguments show higher emotional positivity, stronger convergence with original posts (especially in named entities and psycholinguistic features), and greater alignment with argument quality markers. In contrast, human arguments display more negative emotion, greater lexical and syntactic creativity, and stronger use of interactive discourse.
Moreover, we show that linguistic and argument quality features enable nearly 99% accurate detection of LLM-generated comments to CMV posts from human-written ones. Our approach thus offers a practical safeguard against unethical uses of LLMs in online discussions. Furthermore, tests on an external benchmark show that our lightweight and interpretable method performs comparably to computationally intensive detectors in generalized detection scenarios, highlighting the viability of low-resource, transparent detection methods.
These results prompt important questions for future research: Under what conditions are LLM-generated texts harder to detect? How do the prompt design and task objective influence detectability? How do the convergence patterns of humans and LLMs align with social theories of communication, such as communication accommodation theory? Our framework provides a straightforward and interpretable approach to assess such questions, thereby facilitating future investigations into the nuances of LLM-generated content.