Conversations Gone Awry: Detecting Early Signs of Conversational Failure

Paper · Source

One of the main challenges online social systems face is the prevalence of antisocial behavior, such as harassment and personal attacks. In this work, we introduce the task of predicting from the very start of a conversation whether it will get out of hand. As opposed to detecting undesirable behavior after the fact, this task aims to enable early, actionable prediction at a time when the conversation might still be salvaged.

To this end, we develop a framework for capturing pragmatic devices—such as politeness strategies and rhetorical prompts—used to start a conversation, and analyze their relation to its future trajectory. Applying this framework in a controlled setting, we demonstrate the feasibility of detecting early warning signs of antisocial behavior in online discussions.

As humans, we have some intuition about which conversation is more likely to derail.2 We may note the repeated, direct questioning with which A1 opens the exchange, and that A2 replies with yet another question. In contrast, B1’s softer, hedged approach (“it seems”, “I don’t think”) appears to invite an exchange of ideas, and B2 actually addresses the question instead of stonewalling. Could we endow artificial systems with such intuitions about the future trajectory of conversations?

In this work we aim to computationally capture linguistic cues that predict a conversation’s future health. Most existing conversation modeling approaches aim to detect characteristics of an observed discussion or predict the outcome after the discussion concludes—e.g., whether it involves a present dispute (Allen et al., 2014; Wang and Cardie, 2014) or contributes to the eventual solution of a problem (Niculae and Danescu- Niculescu-Mizil, 2016). In contrast, for this new task we need to discover interactional signals of the future trajectory of an ongoing conversation.

Recent studies have computationally operationalized prior formulations of politeness by extracting linguistic cues that reflect politeness strategies (Danescu-Niculescu-Mizil et al., 2013; Aubakirova and Bansal, 2016). Such research has additionally tied politeness to social factors such as individual status (Danescu-Niculescu- Mizil et al., 2012; Krishnan and Eisenstein, 2015), and the success of requests (Althoff et al., 2014) or of collaborative projects (Ortu et al., 2015). However, to the best of our knowledge, this is the first computational investigation of the relation between politeness strategies and the future trajectory of the conversations in which they are deployed.

In this controlled setting, we find that pragmatic cues extracted from the very first exchange in a conversation (i.e., the first comment-reply pair) can indeed provide some signal of whether the conversation will subsequently go awry. For example, conversations prompted by hedged remarks sustain their initial civility more so than those prompted by forceful questions, or by direct language addressing the other interlocutor. In summary, our main contributions are:

• We articulate the new task of detecting early on whether a conversation will derail into personal attacks;

• We devise a controlled setting and build a labeled dataset to study this phenomenon;

• We investigate how politeness strategies and other rhetorical devices are tied to the future trajectory of a conversation.

We now describe our framework for capturing linguistic cues that might inform a conversation’s future trajectory. Crucially, given our focus on conversations that start seemingly civil, we do not expect overtly hostile language—such as insults (Yin et al., 2009)—to be informative. Instead, we seek to identify pragmatic markers within the initial exchange of a conversation that might serve to reveal or exacerbate underlying tensions that eventually come to the fore, or conversely suggest sustainable civility. In particular, in this work we explore how politeness strategies and rhetorical prompts reflect the future health of a conversation.

Politeness strategies. Politeness can reflect a-priori good will and help navigate potentially face-threatening acts (Goffman, 1955; Lakoff, 1973), and also offers hints to the underlying intentions of the interlocutors (Fraser, 1980). Hence, we may naturally expect certain politeness strategies to signal that a conversation is likely to stay on track, while others might signal derailment.

In particular, we consider a set of pragmatic devices signaling politeness drawn from Brown and Levinson (1987). These linguistic features reflect two overarching types of politeness. Positive politeness strategies encourage social connection and rapport, perhaps serving to maintain cohesion throughout a conversation; such strategies include gratitude (“thanks for your help”), greetings (“hey, how is your day so far”) and use of “please”, both at the start (“Please find sources for your edit...”) and in the middle (“Could you please help with...?”) of a sentence. Negative politeness strategies serve to dampen an interlocutor’s imposition on an addressee, often through conveying indirectness or uncertainty on the part of the commenter. Both commenters in example B (Fig. 1) employ one such strategy, hedging, perhaps seeking to soften an impending disagreement about a source’s reliability (“I don’t think...”, “I would assume...”). We also consider markers of impolite behavior, such as the use of direct questions (“Why’s there no mention of it?’) and sentence initial second person pronouns (“Your sources don’t matter...”), which may serve as forceful sounding contrasts to negative politeness markers. Following Danescu-Niculescu-Mizil et al. (2013), we extract such strategies by pattern matching on the dependency parses of comments.

Focusing on the first comment (represented as ♦s), we find a rough correspondence between linguistic directness and the likelihood of future personal attacks. In particular, comments which contain direct questions, or exhibit sentence initial you (i.e., “2nd person start”), tend to start awry-turning conversations significantly more often than ones that stay on track (bothp < 0.001).11 This effect coheres with our intuition that directness signals some latent hostility from the conversation’s initiator, and perhaps reinforces the forcefulness of contentious impositions (Brown and Levinson, 1987). This interpretation is also suggested by the relative propensity of the factual check prompt, which tends to cue disputes regarding an article’s factual content (p < 0.05).

In contrast, comments which initiate on-track conversations tend to contain gratitude (p < 0.05) and greetings (p < 0.001), both positive politeness strategies. Such conversations are also more likely to begin with coordination prompts (p < 0.05), signaling active efforts to foster constructive teamwork. Negative politeness strategies are salient in on-track conversations as well, reflected by the use of hedges (p < 0.01) and opinion prompts (p < 0.05), which may serve to soften impositions or factual contentions (H¨ubler, 1983).

These effects are echoed in the second comment—i.e., the first reply (represented as !s). Interestingly, in this case we note that the difference in pronoun use is especially marked. First replies in conversations that eventually derail tend to contain more second person pronouns (p < 0.001), perhaps signifying a replier pushing back to contest the initiator; in contrast, on-track conversations have more sentenceinitial I/We (i.e., “1st person start”,p < 0.001), potentially indicating the replier’s willingness to step into the conversation and work with—rather than argue against—the initiator (Tausczik and Pennebaker, 2010).

Distinguishing interlocutor behaviors. Are the linguistic signals we observe solely driven by the eventual attacker, or do they reflect the behavior of both actors? To disentangle the attacker and non attackers’ roles in the initial exchange, we examine their language use in these two possible cases: when the future attacker initiates the conversation, or is the first to reply. In attacker-initiated conversations (Figure 2B, 608 conversations), we see that both actors exhibit a propensity for the linguistically direct markers (e.g., direct questions) that tend to signal future attacks. Some of these markers are used particularly often by the nonattacking replier in awry-turning conversations (e.g., second person pronouns, p < 0.001, ⃝s), further suggesting the dynamic of the replier pushing back at—and perhaps even escalating—the attacker’s initial hint of aggression. Among conversations initiated instead by the non-attacker (Figure 2C, 662 conversations), the non-attacker’s linguistic behavior in the first comment (⃝s) is less distinctive from that of initiators in the on-track setting (i.e., log-odds ratios closer to 0); markers of future derailment are (unsurprisingly) more pronounced once the eventual attacker (▽s) joins the conversation in the second comment.12

More broadly, these results reveal how different politeness strategies and rhetorical prompts deployed in the initial stages of a conversation are tied to its future trajectory.