Modeling Appropriate Language in Argumentation

Paper · Source

Online discussion moderators must make adhoc decisions about whether the contributions of discussion participants are appropriate or should be removed to maintain civility. Existing research on offensive language and the resulting tools cover only one aspect among many involved in such decisions. The question of what is considered appropriate in a controversial discussion has not yet been systematically addressed. In this paper, we operationalize appropriate language in argumentation for the first time. In particular, we model appropriateness through the absence of flaws, grounded in research on argument quality assessment, especially in aspects from rhetoric. From these, we derive a new taxonomy of 14 dimensions that determine inappropriate language in online discussions. Building on three argument quality corpora, we then create a corpus of 2191 arguments annotated for the 14 dimensions. Empirical analyses support that the taxonomy covers the concept of appropriateness comprehensively, showing several plausible correlations with argument quality dimensions.

People have varying degrees of sensitivity to controversial issues and may be triggered by different emotional responses dependent on the issue and the opponents’ arguments (Walton, 2010). This often makes it hard to maintain a constructive discussion. In competitive debates, a moderator ensures that participants argue appropriately. Debating culture, dating back to the 18th century, demands appropriate behavior, such as staying on topic and avoiding overly emotional language

While the notion of appropriateness is treated in argumentation theory as an important subdimension of argument quality (see Section 2), there has been no systematic study of appropriateness,

Given the new corpus, we analyze correlations between the 14 dimensions and the argument quality dimensions in the source corpora in Section 5. Several plausible correlations support that our taxonomy successfully aligns with the theoretical and practical quality aspects modeled in previous work.

The notion of appropriateness has been explored in several sub-disciplines of linguistics. In communicative competence research, Hymes et al. (1972) considered the knowledge about cultural norms as a requirement to produce appropriate speech, which is a central part of acquiring communicative competence. Defining sociolinguistics, Ranney (1992) linked appropriateness to the notion of politeness that is required in various social settings. Later, Schneider (2012) argued that appropriateness is a more salient notion than politeness as it explicitly accounts for the context. Some of these cultural speech properties were identified as linguistic etiquette by Jdetawy and Hamzah (2020), including correct, accurate, logical, and pure language.

Wachsmuth et al. (2017b) only provided a relatively shallow definition of appropriateness that requires a simultaneous assessment of three properties, namely the creation of credibility and emotions as well as proportionality to the issue.

We model toxic emotions based on the emotional fallacies identified by Walton (2010): ad populum, ad misericordiam, ad baculum, and ad hominem. We merged these four into a single sub-dimension called emotional deception based on the results of a pilot annotation study (Section 4). Additionally, we define a sub-dimension excessive intensity to address overly intense emotions. In particular, our analysis revealed the presence of a subset of propaganda errors, including loaded language, flagwaving, repetition, exaggeration, and minimization Da San Martino et al. (2020).

We selected the dimensions that correlated most with appropriateness according to Pearson’s r. These include the four subdimensions of rhetorical effectiveness (besides appropriateness), namely, credibility (.49), emotional appeal (.30), clarity (.45), and arrangement (.48), as well as local acceptability (.54) (sub-dimension of logical cogency) and global acceptability (.59) (sub-dimension of dialectical reasonableness).

we manually analyzed arguments by contrasting pairs of arguments with and without low appropriateness to find patterns that describe what drives the low appropriateness levels within these dimensions.

3.2 Defining Inappropriateness The findings from our analysis led to four core inappropriateness dimensions in our taxonomy: We deem an argument inappropriate (in light of its discussion context) if it is missing commitment of its author to the discussion, uses toxic emotions, is missing intelligibility, or seems inappropriate for other reasons.

Toxic Emotions We model toxic emotions based on the emotional fallacies identified by Walton (2010): ad populum, ad misericordiam, ad baculum, and ad hominem. We merged these four into a single sub-dimension called emotional deception based on the results of a pilot annotation study (Section 4). Additionally, we define a sub-dimension excessive intensity to address overly intense emotions. In particular, our analysis revealed the presence of a subset of propaganda errors, including loaded language, flag-waving, repetition, exaggeration, and minimization Da San Martino et al. (2020).

Missing Commitment This dimension resembles the credibility dimension of Wachsmuth et al. (2017b), but it differs in that we do not mandate arguments to come from or include a trusted source. Rather, the arguments should demonstrate the participant’s general interest in participating in the debate. To formalize this concept, we drew on the five rules for “A Good Dialogue” (Walton, 1999) to create two sub-dimensions of commitment, missing seriousness and missing openness, by examining the extent to which they apply to the arguments identified in the overlap analysis.

Missing Intelligibility The core dimension missing intelligibility results from the overlap analysis of the clarity and arrangement dimensions of Wachsmuth et al. (2017b). We found that the main point of an argument was partly unclear either due to (un)intentional vagueness or overly (un)complex language, which we refer to in our taxonomy as the sub-dimension unclear meaning. Also, derailing a discussion to another issue is a common issue (represented by the sub-dimension missing relevance).

Finally, in some cases the individual claims and premises were intelligible but not their connection. We refer to this as a confusing reasoning. Other Reasons This dimension accounts for reasons that do not fit into the other core-dimensions. As part of this, we observed that some arguments have a detrimental orthography, limiting intelligibility in some cases (spelling or grammatical errors) or increasing emotions in others (capital letters, repeated exclamation points). We leave any other case of inappropriateness as reason unclassified.