Uncovering Latent Arguments in Social Media Messaging by Employing LLMs-in-the-Loop Strategy

Paper · arXiv 2404.10259 · Published April 16, 2024

The widespread use of social media has led to a surge in popularity for automated methods of analyzing public opinion. Supervised methods are adept at text categorization, yet the dynamic nature of social media discussions poses a continual challenge for these techniques due to the constant shifting of the focus. On the other hand, traditional unsupervised methods for extracting themes from public discourse, such as topic modeling, often reveal overarching patterns that might not capture specific nuances. Consequently, a significant portion of research into social media discourse still depends on labor-intensive manual coding techniques and a human-in-the-loop approach, which are both time-consuming and costly. In this work, we study the problem of discovering arguments associated with a specific theme. We propose a generic LLMs-in-the-Loop strategy that leverages the advanced capabilities of large language models (LLMs) to extract latent arguments from social media messaging. To demonstrate our approach, we apply our framework to contentious topics. We use two publicly available datasets: (1) the climate campaigns dataset of 14k Facebook ads with 25 themes and (2) the COVID-19 vaccine campaigns dataset of 9k Facebook ads with 14 themes.

Initially, the framework categorizes textual instances into clusters based on their associated themes. These instances can range from social media posts to comprehensive documents. We divide each theme-based cluster into sub-clusters using a clustering algorithm. This allows for a more granular analysis of the thematic content, revealing the nuances of arguments present within each theme.

To articulate the arguments found within each subcluster, we employ zero-shot multi-document summarization using GPT-4 (Achiam et al., 2023) on the top-k instances. The prompts are engineered to generate short theme-specific summaries in a zero-shot setting. The five closest instances to each centroid are used in the prompt to generate the summaries. This summarization process highlights the key points and arguments without the need for pre-labeled data, showcasing the framework’s unsupervised capabilities.

3.3 Generating and Refining Arguments Subsequently, each sub-cluster summaries serve as a prompt for LLMs in a zero-shot manner to generate specific talking points advocating for the arguments implied in the summaries. This approach ensures that the extracted talking points are relevant and coherent within the context of their respective themes.

To refine the generated arguments, we implement a redundancy check to identify and merge similar talking points. This process involves assessing the similarity between pairs of talking points and consolidating those that exceed a predefined similarity threshold

Here are the two examples of talking points where we decide to merge them based on the threshold value- TP1: "The Build Back Better Act is crucial for economic growth, job creation, and addressing the climate crisis."; TP2: "Legislative support for the Build Back Better Act, emphasizing its benefits for clean energy jobs and climate action."