Characterizing Online Discussion Using Coarse Discourse Sequences

Paper · Source
Conversation Topics DialogSocial Media

As more social interaction takes place online, researchers have become interested in studying the discourse occurring in online social media. From these studies, researchers can examine how people conduct conversations and arguments (Hasan and Ng 2014; Tan et al. 2016) as well as extract information for applications such as search (Cong et al. 2008). While many studies have focused their analyses on metadata surrounding community discussions, other studies have attempted to analyze the textual content of discussions. But this can be difficult as language and interactions are complex and variable from discussion to discussion and community to community.

One method for understanding discussion is through analyzing the high level discourse structures inherent within conversations. Much research has demonstrated the power of using discourse acts, also known as speech acts, which are categories of utterances that pertain to their role in the discussion (e.g. “question” or “answer”). Researchers have used discourse acts towards applications such as building conversational bots (Allen, Ferguson, and Stent 2001) and summarizing spoken discourse (Murray et al. 2006). However, a great deal of research using discourse acts has focused solely on extracting questions and answers (Hong and Davison 2009) or considered only communities for help or technical support (Kim, Wang, and Baldwin 2010).

In this work, we develop a richer categorization of discourse acts towards characterizing a wide range of discussions from a variety of communities.

As discourse acts are usually understood in relation to another piece of discourse, we collected both the discourse act of a comment as well as the discourse relation of that comment, also known as a link to a prior comment that the comment is responding to, if it exists. For instance, an ANSWER is always related to a prior QUESTION. Some categories may not always be in relation to another comment, such as a new QUESTION or an ANNOUNCEMENT. In some categorizations of discourse, such as RST (Mann and Thompson 1988), there are only discourse relations, and the relations themselves are grouped into categories and named. In our case, we do not name types of discourse relations explicitly, but they are implicitly inferred by the discourse acts they link. For instance, a hypothetical discourse relation “Answers” would always link ANSWER to QUESTION.

Discourse Act Definitions Detailed information about each discourse act and the relations allowed are given below. For our annotators, we provided a lengthier tutorial and several examples for each act, which we will release with our dataset.

QUESTION: A comment with a question or a request seeking some form of feedback, help, or other kinds of responses. While the comment may contain a question mark, it is not required. For instance, it might be posed in the form of a statement but still soliciting a response. Also, not everything that has a question mark is automatically a QUESTION. For instance, rhetorical questions are not seeking a response.

Relation: This comment might be the first in a thread and have no relation to another comment. Or, it could be a clarifying or follow-up QUESTION linking to any prior comment.

ANSWER: A comment that is responding to a QUESTION by answering the question or fulfilling the request. There can be more than one ANSWER responding to a QUESTION.

Relation: An ANSWER is always linked to a QUESTION.

ANNOUNCEMENT: A comment that is presenting some new information to the community, such as a piece of news, a link to something, a story, an opinion, a review, or insight.

Relation: This comment has no relation to a prior comment and is always the initial post in a thread.

AGREEMENT: A comment that is expressing agreement with some information presented in a prior comment. It can be agreeing with a point made, providing supporting evidence, providing a positive example or experience, or confirming or acknowledging a point made.

Relation: This comment is always linked to a prior comment to which it is agreeing.

APPRECIATION: A comment that is expressing thanks, appreciation, excitement, or praise in response to another comment. In contrast to AGREEMENT, it is not evaluating the merits of the points brought up. Comments of this category are more interpersonal as opposed to informational.

Relation: This comment is always linked to a prior comment for which it is expressing appreciation.

DISAGREEMENT: A comment that is correcting, criticizing, contradicting, or objecting to a point made in a prior comment. It can also be providing evidence to support its disagreement, such as an example or contrary anecdote.

Relation: This comment is always linked to a prior comment to which it is disagreeing.

NEGATIVE REACTION: A comment that is expressing a negative reaction to a previous comment, such as attacking or mocking the commenter, or expressing emotions like disgust, derision, or anger, to the contents of the prior comment.

This comment is not discussing the merits of the points made in a prior comment or trying to correct them.

Relation: This comment is always linked to a prior comment to which it is negatively reacting.

ELABORATION: A comment that is adding additional information on to another comment. Oftentimes, one can imagine it simply appended to the end of the comment it elaborates on. One can elaborate on many kinds of comments, for instance, a question-asker elaborating on their question to provide more context, or someone elaborating on an answer to add more information.

Relation: This comment is always linked to a prior comment upon which it is elaborating.

HUMOR: This comment is primarily a joke, a piece of sarcasm, or a pun intended to get a laugh or be silly but not trying to add information. If a comment is sarcastic but using sarcasm to make a point or provide feedback, then it may belong in a different category.

Relation: At times, this comment links to another comment but other times it may not be responding to anything.