Irony in Emojis: A Comparative Study of Human and LLM Interpretation

Paper · arXiv 2501.11241 · Published January 20, 2025

irony poses a significant challenge for Large Language Models (LLMs) due to its inherent incongruity between appearance and intent. This study examines the ability of GPT-4o to interpret irony in emojis. By prompting GPT-4o to evaluate the likelihood of specific emojis being used to express irony on social media and comparing its interpretations with human perceptions, we aim to bridge the gap between machine and human understanding. Our findings reveal nuanced insights into GPT-4o’s interpretive capabilities, highlighting areas

Emojis often embody irony through the contrast between their outward appearance and intended meaning, complicating their interpretation. Their euphemistic, humorous, and context-dependent uses further challenge the ability of LLMs to accurately discern sentiment (Lyu et al. 2024b). Addressing this challenge is essential, as accurate detection of irony in emojis could significantly enhance applications such as virtual assistants, chatbots, and sentiment analysis tools (Lyu et al. 2024a).

Specifically, we measure the frequency with which an emoji is used to convey irony in real-world social media posts and calculate its relative proportion of ironic usage as an irony score. We follow Xiang et al. (2020) and define “irony” as instances where an emoji conveys a meaning opposite to its literal interpretation, resulting in a reversal of understanding.

The exact prompt reads: “Imagine you are a social media user; rate your likelihood of using this emoji if your intention is to express irony on an 11-point scale (11=very likely, 1=very unlikely). The rating may depend on the context. You need to give the most likely rating and your explanation. Do not give multiple ratings in terms of scenarios. Only one rating is required.” Following the approach of Cz˛estochowska et al. (2022) and Lyu et al. (2024b), we provide the model with emojis in image format. We further investigate whether GPT-4o’s classification changes when demographic information is included in the prompt by revising the first sentence to “Imagine you are a [gender] social media user aged [age] ...”

we find that the median irony score assigned by GPT-4o is significantly higher than the scores perceived by humans (W = 918.5, p < .001). This indicates that, on average, GPT-4o considers the same emoji more likely to be used for expressing irony compared to human perception. This discrepancy may stem from GPT-4o being training on data with a disproportionate representation of ironic emoji usage.

To explore this alignment further, we prompt GPT-4o to interpret emojis. For example, GPT-4o explains that the emoji “ ” can convey irony due to its nuanced facial expression. The smirk’s inherent ambiguity makes it wellsuited for ironic statements, where the intended meaning diverges from the literal interpretation. This emoji can suggest sentiments like “I know something you don’t" or “I’m not being entirely serious," which are consistent with the subtle and indirect nature of irony.