Large Language Models are as persuasive as humans, but how? About the cognitive effort and moral-emotional language of LLM arguments

Paper · arXiv 2404.09329 · Published April 14, 2024

Abstract. Large Language Models (LLMs) are already as persuasive as humans. However, we know very little about how they do it. This paper investigates the persuasion strategies of LLMs, comparing them with human-generated arguments. Using a dataset of 1,251 participants in an experiment, we compare the persuasion strategies of LLM-generated and human-generated arguments through measures of cognitive effort (lexical and grammatical complexity) and moral-emotional language (sentiment and morality). Our results indicate that LLMs produce arguments that require higher cognitive effort, exhibiting more complex grammatical and lexical structures than human counterparts. Additionally, LLMs demonstrate a significant propensity to engage more deeply with moral language, utilizing both positive and negative moral foundations more frequently than humans. In contrast with previous research, no significant difference was found in the emotional content produced by LLMs and humans. The fact that we show that there is no equivalence in process despite equivalence in outcome, contributes to the emergent knowledge regarding AI and persuasion, highlighting the dual potential of LLMs to both enhance and undermine informational integrity through persuasion strategies.

Large Language Models (LLMs) can create content that is highly persuasive, equaling, or even surpassing (Hackenburga et al., 2023), the effectiveness of humans in convincing users about contentious political issues (Bai et al., 2023). These findings are raising concerns, since LLMs are capable of producing content as persuasive as original propaganda crafted by humans (Goldstein et al., 2024). Also, the persuasiveness of deceptive content is increased when LLMs can access personal information for tailoring the messages to specific audiences, allowing for cheap automation of persuasive misinformation at a huge scale, increasing not just its efficiency, but also its effectiveness (Costello et al., 2024; Matz et al., 2024). If that was not enough, these results are even more worrying when we take into account that the persuasion capabilities of LLMs is increasing as these models evolve (Durmus et al., 2024), with the potential of creating a perfect storm of misinformation (Galaz et al., 2023). However, despite the empirical evidence that LLMs are already as persuasive as humans, and likely to surpass it in next developments, we know very little about how.

To shed light into the research gap of how LLMs are as persuasive as humans, we rely on previous evidence about communication strategies for persuasiveness showing that cognitive effort, and moral-emotional language is associated with higher persuasiveness. For example, existing evidence indicates that each additional negative word in a headline boost click-through rate by 2.3% (Robertson et al., 2023). Moreover, previous evidence shows that reduced cognitive effort to process content is associated with viral misinformation (Carrasco-Farré, 2022). Lastly, the persuasiveness of "moral-emotional" language suggests that high-arousal, morally charged, and emotional rhetoric is highly persuasive (Brady et al., 2017; Rathje et al., 2021).

In order to test whether LLMs apply these communication strategies to achieve human-level persuasion, we rely on an experiment carried out at Antrophic to compare the persuasiveness of LLMs and humans (N = 1,251) through 56 claims on different topics, with arguments written both by humans and generated by AI models (Durmus et al., 2024). Persuasiveness is measured based on the shift in agreement with the claims before and after exposure to the LLM/human arguments. Building on that, we analyze the differences between LLM and human arguments in terms of cognitive effort (lexical and grammatical complexity), appeal to moral-emotional language (sentiment and morality), and the moral foundations used in each argument. In addition, we repeat the analysis comparing different LLM prompts that elicit different persuasion strategies. Our results indicate that arguments from LLMs require higher cognitive effort compared to human arguments, are equality neutral in terms of sentiment, but more appealing to morality compared to humans.

Such results are important amidst concerns and discussions over LLM persuasiveness, digital misinformation, and AI ethics, as they can guide communication scientists in understanding how information processing influences persuasion. For policymakers, technologists, and educators, these findings stress the importance of developing robust strategies to counterbalance the risks posed by persuasive LLMs, including literacy in AI-generated content discernment and ethical frameworks for AI use to prevent manipulation and enhance digital communication integrity. Taken together, these implications aim to uphold the integrity of digital communication, fostering an informed society that is resilient to the subversive potentials of persuasive LLMs.

2.2. Communicative strategies for persuasion

Persuasion is a key subject within psychology, which involves trying to influence the thoughts, emotions, or actions of others through different communication strategies (Briñol & Petty, 2012; Petty & Briñol, 2015; Rocklage et al., 2018). Two of the most prominent communication strategies for persuasion are related to the cognitive effort required to process the argument, and the moral-emotional language used to convey it.

2.2.1. Cognitive effort

When the cognitive effort to process a given text is lower, tasks become simpler, making individuals more inclined to persist in performing them (Kool et al. 2010; Zipf 1949). This is why less cognitive effort for processing arguments result in more positive emotional response (Alte & Oppenheimer in 2019), increased focus (Berger et al., 2023), and ultimately, more persuasive arguments (Manzoor et al., 2024). Overall, previous evidence shows that increased readability and lower complexity is associated with higher persuasion levels (Packard et al., 2023). However, other studies indicate that increased processing can be advantageous. For example, Kanuri et al. (2018) suggest that social media content that demands higher cognitive processing garner increased engagement. Therefore, it is possible to consider that characteristics of the argument that enhance cognitive processing could help maintain focus and promote further engagement.

Traditionally, the level of cognitive effort required to process a given textual argument is usually operationalized through two measures. First, by analyzing the grammatical complexity of the content, which is commonly known as “readability” (Manzoor et al., 2024). This is done through the identification of features in the sentence that increase/decrease the cognitive effort required to understand it. This includes analyzing sentence length, number of words within the sentence, or the use of subordinate forms. For example, the sentence “The cat sat on the mat” is easier to process than the following alternative with the exact same meaning: “The cat, exhibiting typical feline behavior characteristic of its species when seeking comfort, positioned itself centrally upon the woven mat.”

Secondly, through the lexical complexity of the argument. Previous evidence shows that higher lexical complexity is associated with a higher cognitive effort needed to process a given argument (Berger et al., 2023; Pitler & Nenkova, 2008; Schwarm & Ostendorf, 2005). For example, the sentence “Psycholinguistics studies how we learn and use words” has less lexical diversity than “The field of psycholinguistics contemplates the intricate phenomena of lexicon formation within the cognitive framework of individual language acquisition and systemic grammatical convergence.”

2.2.2. Moral-emotional language

Previous evidence shows that people tend to increase the emotional intensity of their arguments in order to influence others' opinions (Rocklage et al., 2018). This communicative strategy of employing emotional language in efforts to persuade is based on the Aristotelian pathos, which argues that effective persuasion involves evoking the right emotional response in the audience (Formanowicz et al., 2023). Indeed, experimental evidence underscores emotional language’s causal impact on attention and persuasion (Berger et al., 2023; Tannenbaum et al., 2015), indicating that emotionality is a natural tool in influence attempts (Rocklage et al., 2018). Furthermore, emotions are often closely linked to moral evaluations (Brady et al., 2017; Horberg et al., 2011; Rozin et al., 1999).

Morality, that is people’s beliefs about right and wrong (Ellemers & van den Bos, 2012; Haidt, 2003), is also intricately tied to the effectiveness of written content (van Bavel et al., 2024). The Moral Foundations Theory (MFT) proposes that five moral foundations - care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and sanctity/degradation - are essential in human moral reasoning and exist in all cultures and that these principles dictate how people evaluate ethical circumstances (Graham et al., 2012). Indeed, previous evidence shows that morally charged content tends to capture attention and foster engagement more readily (Brady et al., 2000a, 2000b). Moreover, previous research has indicated that moralized language, particularly when coupled with emotional elements, can significantly increase the sharing and dissemination of content within social networks, thereby enhancing persuasiveness (Marwick, 2021). The spreading of moral-emotional language in online discourse is largely driven by its ability to resonate with individuals’ deep-seated moral values and emotional responses, which are crucial in the context of persuasive communication (Brady et al., 2023).

4.1. Overall results

Our results, visualized in Figure 1, indicate a statistically significant difference in readability scores between arguments authored by humans and those generated by LLM (p < .001; [1.36, -0.83]). LLMs produced arguments which require a higher cognitive effort (mean = 13.26) compared to human-authored arguments (mean = 12.16). This suggests that arguments generated by LLMs tend to be more grammatically complex. In terms of lexical complexity, measured by perplexity, LLMs again demonstrated a significantly higher mean score (mean = 111.39) than humans (mean = 102.69). The observed mean difference (-8.695) indicates that LLM-generated arguments were more lexically complex than those produced by humans (p < .001; [-10.01, -7.38]).

Contrastingly, sentiment analysis did not show a significant difference in the emotional polarity between human and LLM-generated arguments (p < .980; [-1.39, 1.36]). The means were virtually identical, with human-authored arguments having a mean sentiment score of 0.98 and LLM-authored arguments a mean of 1.00. The use of moral-emotional language, as captured by the total moral foundation count, also differed significantly between the two sources (p < .001; [-2.92, -1.44]). The average morality score for LLMs was 12.09, while for humans, it was 10 9.91, resulting in a mean difference of -2.18, indicating that LLMs tend to incorporate more moral language into their arguments than humans do.

When analyzing more in detail their differing appeal to morality, our results also bring statistically significant differences (see Figure 2). The use of overall positive moral foundations showed a significant difference (p < .000; [-1.92, -0.77]), with LLMs (mean = 8.66) employing these elements more than humans (mean = 7.32), resulting in a mean difference of 1.34, indicating a clear distinction in the positive moral content of LLM-generated arguments. When examining individual positive moral foundations, we observed the following.

Arguments from LLMs displayed a higher average use of care-related virtues (mean = 3.44) compared to humans (mean = 2.99), with a statistically significant mean difference of -0.45 (p = .007 [-0.78, -0.12]). Fairness was also more prevalent in LLM-authored arguments (mean = 0.92) than in human-authored ones (mean = 0.68). The mean difference was -0.24 (p = 0.001; [-0.39, -0.10]). Moreover, LLMs exhibited a higher utilization of authority virtues (mean = 1.80) compared to humans (mean = 1.40). The mean difference was -0.40 (p = 0.002; [-0.65, - 0.15]). As for sanctity virtues, there was a modestly higher representation in arguments by LLMs (mean = 0.70) over humans (mean = 0.52), with a mean difference of -0.18 (p = .017; [- 0.33, -0.03]). In contrast, no significant difference was found in the use of loyalty virtues (p = .499; [-0.31, 0.15]) between LLMs (mean = 1.81) and humans (mean = 1.73).

DISCUSSION

Our results show a counterintuitive relationship between cognitive effort and persuasiveness in LLM-generated arguments. Contrary to previous findings that suggest a lower cognitive effort is associated with higher persuasion levels (Alte & Oppenheimer in 2019; Berger et al., 2023; Kool et al. 2010; Manzoor et al., 2024; Packard et al., 2023), our results indicate that LLM arguments, which require higher cognitive effort due to increased grammatical and lexical complexity (Carrasco-Farré, 2022), are as persuasive as human-authored arguments. This finding aligns with suggestions by Kanuri et al. (2018) that higher cognitive processing can promote engagement, suggesting that the increased complexity in LLM-generated arguments does not hinder its persuasive power. Instead, the complexity might encourage deeper cognitive engagement (Kanuri et al., 2018), prompting readers to invest more mental effort in processing the arguments, potentially leading to more persuasion as readers may interpret the need for such cognitive investment as a sign of the argument's substance or importance.

Also, our analysis further highlights the importance of the moral component within moralemotional language in persuasion. LLMs, with their higher use of moral language, both positive and negative, were shown to be as persuasive as humans. In the case of LLMs, this supports the proposition by Brady et al. (2017) and Rocklage et al. (2018) that morally laden language significantly impacts attention and can be highly persuasive and that these dimensions are universally resonant in moral reasoning (Graham et al., 2012). Interestingly, the finding that LLMs utilize negative moral foundations more frequently, particularly harm and cheatingrelated language, may reflect a strategic use of moral-emotional language that aligns with the persuasive strategy of negative bias, where negative information tends to influence judgments more than equivalent positive information (Robertson et al., 2023; Rozin & Royzman, 2001). On the other hand, the negligible difference found in the sentiment analysis points to a cautious understanding of emotional content in persuasion from LLMs. The similarity in sentiment scores between LLMs and humans suggests that the mere emotional charge of the language may not be as pivotal as the moral framing of the content, aligning with the view that morality can be a stronger driver of persuasion than emotions alone (van Bavel et al., 2024).

However, we should interpret these results cautiously. First, because the aim of the paper is not to prove a causal relationship between communicative strategies and persuasiveness, but to show that LLMs use different persuasion strategies compared to humans, leading to the same persuasive level. In other words, our paper shows that there is no equivalence in process despite equivalence in outcome. Nevertheless, the fact that LLMs and humans are equally persuasive despite the observed differences in communicative strategies does not necessarily mean that these factors have no effect on persuasion. Equal persuasiveness does not imply that the processes leading to that outcome are identical. In fact, our results indicate that LLMs and humans are equally effective in persuading, but they do so through different strategies, which is crucial distinction for understanding the full capabilities and limitations of LLMs compared to human arguments.

Secondly, the empirical observation that LLMs, despite employing a higher frequency and proportion of moral terms, are as persuasive as human arguments (Durmus et al., 2024), suggests that only the quantity of moral language does not linearly enhance persuasiveness or that moral language interacts with other variables not observed in the data; for example, user prior beliefs (Durmus & Cardie, 2019). This finding could imply that there are limits to the effectiveness of moral language in persuasion, and that simply increasing moral content is not necessarily a catalyst for greater LLM persuasive impact. Alternatively, it could also point out towards compensation and balancing effects. Even though LLMs exhibit higher complexity and more frequent use of moral language, these characteristics could be compensating for each other or balancing out in ways that preserve overall persuasiveness. For example, higher complexity could potentially detract from persuasiveness due to increased cognitive load (Alte & Oppenheimer in 2019; Berger et al., 2023; Kool et al. 2010; Manzoor et al., 2024; Packard et al., 2023), but the greater use of moral-emotional language could enhance persuasiveness (van Bavel et al., 2024), thus counterbalancing any negative effects.

Moving on to comparing our results with other studies on LLM persuasiveness, previous research has shown that rational, alternative explanations, or counterevidence is more effective for persuasion than psychological approaches like cognitive effort or moral-emotional language (Costello et al., 2024). However, there is a difference between the experimental setting of Costello et al. (2024) and the dataset we have analyzed (Durmus et al., 2024). While Costello et al. (2024) disclosed to the participants when they were interacting with a LLM, this was not the case for Durmus et al. (2024), which may explain the diverging results that we obtained in this paper. Since previous research has shown that people attribute more impartiality to AI compared to humans (Claudy et al., 2022; Logg et al., 2019), future research should investigate if knowingly interacting with a LLM increases the effect of rational arguments, while the opposite is true when the participants do not know who they are interacting with.