Event-Aware Sentiment Factors from LLM-Augmented Financial Tweets: A Transparent Framework for Interpretable Quant Trading

Paper · arXiv 2508.07408 · Published August 10, 2025
Sentiment Semantics Toxic DetectionsSocial MediaLinguistics, NLP, NLUNatural Language InferenceReading Summarizing

In this study, we wish to showcase the unique utility of large language models (LLMs) in financial semantic annotation and alpha signal discovery. Leveraging a corpus of company-related tweets, we use an LLM to automatically assign multilabel event categories to high-sentiment-intensity tweets. We align these labeled sentiment signals with forward returns over 1-to-7-day horizons to evaluate their statistical efficacy and market tradability. Our experiments reveal that certain event labels consistently yield negative alpha, with Sharpe ratios as low as -0.38 and information coefficients exceeding 0.05, all statistically significant at the 95% confidence level. This study establishes the feasibility of transforming unstructured social media text into structured, multi-label event variables. A key contribution of this work is its commitment to transparency and reproducibility; all code and methodologies are made publicly available. Our results provide compelling evidence that social media sentiment is a valuable, albeit noisy, signal in financial forecasting and underscore the potential of open-source frameworks to democratize algorithmic trading research.

This paper addresses this explanatory gap by introducing a novel framework that integrates Large Language Models (LLMs) with quantitative factor modeling to move beyond coarse sentiment and extract interpretable, event-driven factors from social media. Our central thesis is that the true value of social media data lies not just in its emotional intensity but in its rich semantic structure. We posit that by using an LLM to automatically assign multi-label event categories to high-intensity tweets—such as identifying discussions related to rumor/speculation, retail investor hype, or brand boycotts—we can construct more robust and interpretable predictive signals.

A core innovation in our pipeline is augmenting each tweet with both sentiment intensity and multi-label semantic tags. This allows us to move beyond binary polarity and construct interpretable, event-level predictors.

Sentiment Polarity (Net Tone). Each tweet is assigned a continuous sentiment score, referred to as net tone, reflecting the directional emotional intensity of the text. We adopt the approach of Sowinska et al., where a stacked LDA topic model followed by logistic regression is trained to predict forward returns, thereby generating polarity scores aligned with market response. Our modular framework also supports LLM-prompted polarity scoring for alternative use cases.

Multi-Label Event Tagging via LLM. To extract interpretable event-level semantics, we use a commercial-grade LLM to perform zero-shot, multi-label classification. Each tweet is prompted against a curated dictionary of 70+ financially relevant event types (e.g., Rumor/Speculation, Retail Investor Buzz, Brand Boycott). The LLM assigns one or more applicable labels per tweet. Tweets with multiple tags have their net tone duplicated across tags for subsequent aggregation. This enables high-level semantic structuring of otherwise opaque textual data.

While our core strategy demonstrated success using a lexicon-based approach, we conducted an exploratory analysis to probe the potential of Large Language Models (LLMs) for discovering more sophisticated, theme-driven trading signals.

We prompted a pre-trained financial LLM to classify tweets not merely by their polarity but by their underlying narrative themes, such as ”Speculation/Rumor,” ”Retail Investor Buzz,” or ”Geopolitical Tension.” We then analyzed the predictive power of these thematic labels.

The results, presented in Table 2, reveal distinct and powerful patterns. The analysis shows that several thematic labels are potent contrarian indicators. For instance, portfolios formed on days with a high prevalence of tweets about ”Speculation/Rumor” or ”Geopolitical Tension” consistently yielded statistically significant negative returns across multiple time horizons (1-day to 7-day), as evidenced by their significant negative Sharpe ratios. This suggests that a high volume of discussion around these particular themes is a precursor to negative price movements. The ”Retail Investor Buzz” category shows a more complex dynamic, starting as a negative signal before its Information Coefficient (IC) turns positive at the 7-day horizon, hinting at a possible short-term overreaction and subsequent reversal.

This analysis demonstrates that LLMs can move beyond simple positive/negative sentiment to uncover sophisticated, theme-driven market dynamics. While these themes may act as contrarian indicators on their own, they can still be harnessed to build profitable strategies. Figure 3 illustrates the performance of a long-only portfolio constructed by sorting stocks based on their LLM-generated thematic scores and investing in the top quintile. The positive cumulative return across different horizons highlights the potential of using these nuanced LLM-derived signals to construct alpha-generating portfolios.