Proactive Moderation of Online Discussions: Existing Practices and the Potential for Algorithmic Support

Paper · arXiv 2211.16525 · Published November 29, 2022
Sentiment Semantics Toxic DetectionsSocial MediaConversation Topics Dialog

Multiple studies on content moderation have identified a problem of scale: even if antisocial behavior is a small fraction of all content that gets posted, the sheer size of modern online platforms, together with the relatively small number of moderators present on most platforms, makes it infeasible for human moderators to keep up with all the content in need of moderation [16, 24, 47, 65]. This has led to mental strain and burnout among moderators [16] and has directly inspired calls for the development of technological assistance to reduce the burden on human moderators [64]. As Gillespie writes, “the strongest argument for the automation of content moderation may be that, given the human costs, there is simply no other ethical way to do it, even if it is done poorly” [24]. Technological responses to this call have ranged in complexity: basic tools include simple word-based filters [7, 47, 64] and blocklists [37], while more advanced systems attempt to use machine learning and natural language processing techniques to automatically identify antisocial content [20, 51, 65].

4.2.1 Backend: The CRAFT Architecture. Our prototype tool is powered by a recent Natural Language Processing paradigm—conversational forecasting—which can be used to predict the future occurrence of an event in a conversation based on its current state [46, 66]. Prior work has applied this paradigm to the task of forecasting future antisocial behavior in online discussions, and found that algorithmic forecasting models are approaching the level of human intuition on this task, albeit with lower accuracy [66]. While algorithmic models are not yet at the level of human performance, they are at a level that is comparable to models already used in existing Wikipedia moderation tools: the current state-of-the-art model for forecasting future antisocial behavior, CRAFT, has a reported F1 score of 69.8% [13], which is close to that of some models used in the popular ORES moderation tool.8 We therefore believe that such models, while imperfect, are mature enough to be potentially useful to moderators. In light of this, we use the publicly available CRAFT model trained for forecasting future antisocial behavior on Wikipedia9 to power our prototype tool.

However, moderators’ ability to identify at-risk conversations to monitor is limited by the scale of the platform. P9 explains how even within the subset of topics they are interested in and engage in, their ability to effectively proactively monitor conversations is limited by their sheer number, which forces them to use only simplistic strategies, such as random discovery, to identify at-risk conversations to monitor:

P9: There are too many [potentially at-risk conversations] to proactively monitor. I know there’s about 65 or 60 ongoing ones which are certainly always going to be at risk. [...] So I usually either wait until I’m asked, or I happen to see something, or I skip around and happen to notice something.

The problem of scale is exacerbated by the inherent difficulty of determining when a conversation is in need of a proactive intervention. While every participant we interviewed believes there are some contexts in which they can foresee derailment, as described in Section 5.2.3, there is a wide range in how broad this context is and how confident participants are in their forecasts. Four participants believe that they can confidently forecast antisocial behavior in any Wikipedia context, but four others believe that they can only do so in very specific contexts with low confidence, and the last participant believes they can only make such forecasts in conversations on a handful of specific topics among discussion participants they know personally.

Moderators’ feedback on the prototype tool suggests that information presented in the tool’s Ranking View is helpful in discovering at-risk conversations, although individual moderators differed in their evaluation of exactly which pieces of information were most useful. For example, P4 reported that they would mainly use the CRAFT score to decide which conversations were worth monitoring:

P4: [For monitoring] I would just pick the ones with the highest score ’cause it seems to be somewhat accurate.

Meanwhile, other participants highlighted the score change representation (i.e., the colored arrows) as providing an easy way to get a sense of when a monitored conversation needs to be further inspected.

P7 reports: P7: I like the score change indicator. That is useful. From a cursory glance, it looks like if the score is going up, I would inspect it, if the score was going down, maybe it is not worth inspecting.

All together, five participants described how both the score and score change representation would be useful towards discovering these at-risk conversations.

However, moderators also identified several aspects of conversations that play into their existing intuitions about whether to monitor a conversation, but are not captured by the prototype tool. In particular, three participants mentioned wanting to know the ages of conversations, since their intuition is that if a conversation has been inactive for an extended period of time, it is unlikely to pick up again at all, much less turn uncivil.

P7 expresses this view: P7: That is very useful. That is probably all I would really need, too, except for the age of the conversation would also be another useful column because if the conversation was old, it wouldn’t be worth my time to investigate it anyway but if I see the last comment was added within a day or two, I would then check it out. If it was more than, like, 2 or 3 days old, I mean, the conversation is probably dead anyway so it is not worth my time.