Linguistic markers of inherently false AI communication and intentionally false human communication: Evidence from hotel reviews

Paper · Source

To the human eye, AI-generated outputs of large language models have increasingly become indistinguishable from human-generated outputs. Therefore, to determine the linguistic properties that separate AI-generated text from human-generated text, we used a state-of-the-art chatbot, ChatGPT, and compared how it wrote hotel reviews to human-generated counterparts across content (emotion), style (analytic writing, adjectives), and structural features (readability). Results suggested AI-generated text had a more analytic style and was more affective, more descriptive, and less readable than human-generated text. Classification accuracies of AIgenerated vs. human-generated texts were over 80%, far exceeding chance (~50%). Here, we argue AI-generated text is inherently false when communicating about personal experiences that are typical of humans, and differs from intentionally false human-generated text at the language level. Implications for AI-Mediated Communication and deception research are discussed.

The present investigation is timely and important for several reasons. First, the current work draws on and extends a framework of AI-Mediated Communication (AIMC) to identify how words signal AI-generated and human-generated text (Hancock et al., 2020; Jakesch et al., 2019). Human-generated hotel reviews were compared to AI-generated hotel reviews to identify how such text types are different in terms of content (e.g., emotion), style (e.g., analytic writing, adjectives), and structure (e.g., readability). Second, by definition, AI-generated text about certain human experiences are false because AI systems produce an output that is fictious (Evans et al., 2022). In other words, AI systems can create reality from fantasy, such as writing a hotel review from the perspective of a non-existent human who never stayed at a hotel. This requires an investigation into the differences between AI-generated text that we suggest is inherently false (e.g., an AI wrote like it had an experience, but this experience could never have occurred) and human-generated text that is intentionally false when writing about personal experiences (e.g., a human wrote like it had an experience, but this experience did not occur and it is therefore deceptive). Such an evaluation can illuminate how intentionality (e.g., being purposefully misleading and withholding the truth from others) (Levine, 2014) is revealed in language. Most agree that an AI cannot have intentionality because this requires consciousness (Husserl, 1913), but a humans can. Together, this work simultaneously addresses a need to understand how AI and human language differ, and how false statements by an AI are different from false statements by a human as approximated by word patterns.

To demonstrate a key point of this paper — how AI can produce a communication output to achieve some goal that appears real, despite being fabricated — the prior paragraph of this section was written by a large language model and chatbot, ChatGPT (except for the first sentence).1 To the best of our knowledge, the Karr et al. (2016) citation does not exist, though the paper’s findings appear to be genuine. While authors of the current paper recognize and are sensitive to the ethical concerns associated with performing this exercise (Collins, 2022; Hancock et al., 2020; Susser et al., 2018), the results offer a practical reason for investigating how AI-generated text is different from human-generated text: large language models are now effective content generators and can produce human-like language. It is presently unclear how AI-generated text differs from human-generated text particularly when reporting on experiences (though, see Giorgi et al., 2023), which motivates our current paper.

In addition to examining emotional content across text types, other features related to linguistic style and structure are considered based on prior work. The linguistic style of AIgenerated and human-generated texts will be measured using an analytic writing index comprised of function words (e.g., articles, prepositions, pronouns) (Jordan et al., 2019; Pennebaker et al., 2014). Analytic writing is a proxy for complex and elaborate thinking (Markowitz, 2023a; Seraj et al., 2021), describing how one thinks and reasons compared to what they think or reason about. Analytic writing has been applied to human-generated texts to evaluate a range of social and psychological dynamics such as persuasion (Markowitz, 2020a), trends in political speech (Jordan et al., 2019), need for cognition (Markowitz, 2023a), and individual differences like gender (Meier et al., 2020).

A second exploratory style dimension considers the descriptive nature of AI-generated and human-generated texts. One way to evaluate the descriptiveness of a text is to assess its rate of adjectives. Language patterns with high rates of adjectives tend to be more elaborate and narrative-like compared to language pattens with low rates of adjectives (Chung & Pennebaker, 2008). Further, adjectives are also a key marker of false speech (Johnson & Raye, 1981; Markowitz & Hancock, 2014) and therefore, consistent with a second aim of this paper (e.g., identifying how inherently false versus intentionally false speech is communicated when reporting on experiences), it is critical to evaluate the detailed style of different text types (AIgenerated vs. human-generated). Finally, prior work suggests AI-generated messages may be less wordy than human-generated messages (Hohenstein & Jung, 2020), making the structural complexity of AI-generated text a key interest of the current paper

Conclusion

The current paper evaluated the degree to which AI-generated text has linguistic signals that are separable from human-generated text when reporting on experiences. We presented some of the first work to demonstrate that when reporting on experiences like a hotel stay, AI-generated text from ChatGPT is inherently false and more analytic, more emotional, more descriptive, and less readable than intentionally false human-generated text. We encourage future work to examine more large language models and use other tasks to continually identify patterns that reveal linguistic differences between AI and humans, especially as the technology improves.