Exploring the Potential of Large Language Models in Computational Argumentation
Computational argumentation has become an essential tool in various domains, including law, public policy, and artificial intelligence. It is an emerging research field in natural language processing that attracts increasing attention. Research on computational argumentation mainly involves two types of tasks: argument mining and argument generation. As large language models (LLMs) have demonstrated impressive capabilities in understanding context and generating natural language, it is worthwhile to evaluate the performance of LLMs on diverse computational argumentation tasks.
we present a new benchmark dataset on counter speech generation that aims to holistically evaluate the end-to-end performance of LLMs on argument mining and argument generation. Extensive experiments show that LLMs exhibit commendable performance across most of the datasets, demonstrating their capabilities in the field of argumentation.
researchers have dedicated considerable efforts to two distinct directions (Chakrabarty et al., 2019; Cheng et al., 2021; Alshomary et al., 2021; Bilu et al., 2019). The first direction, argument mining, focuses on understanding unstructured texts and automatically extracting various argumentative elements (Cabrio and Villata, 2018; Levy et al., 2014a; Rinott et al., 2015; Cheng et al., 2022). The other direction is argument generation, which aims to generate argumentative texts based on external knowledge (Hua et al., 2019; Schiller et al., 2020) or summarize key argument points. (Syed et al., 2021; Roush and Balaji, 2020).
Unlike classical structure prediction NLP tasks like named entity recognition that typically take a single sentence as the input and extract token-level information, computational argumentation tasks require discourse-level comprehension. This requirement makes it challenging and laborious to gather a large volume of labeled data for training, hindering the progress of research in this field.
To bridge this gap, we categorize current computational argumentation tasks into two primary classes, comprising six distinct categories. In addition, we establish a standardized format and evaluation metrics for fourteen openly available datasets. Secondly, existing tasks and datasets either focus on argument mining or argument generation. To take a holistic approach, we propose a new task that integrates both argument mining and generation. This task is designed to generate counter speeches in response to debate speeches, which typically advocate a particular stance.
This task requires the model to understand the argumentative structures in the supporting speech, meanwhile to generate the counter speech against the proposition. To facilitate the study, we construct a new document-to-document counterargument generation benchmark