Do language models overestimate how often irony appears?
This explores whether LLMs systematically misread ironic intent in text, assigning higher irony scores than humans do. The gap suggests models learn irony patterns from training data without understanding their actual frequency in real communication.
GPT-4o can interpret ironic intent in emoji usage. But it systematically overestimates ironic intent compared to humans — the median irony score assigned by GPT-4o is significantly higher than human perception (p < .001). LLMs detect irony as a category but miscalibrate its prevalence (Irony in Emojis: A Comparative Study of Human and LLM Interpretation).
This overestimation reveals something important about how LLMs process pragmatic meaning. Irony detection is a pattern-matching success: the model has learned which textual features correlate with ironic intent in its training data. But ironic patterns are over-represented in training data relative to their actual frequency in human communication, because ironic usage is more salient, more commented upon, more explicitly labeled than sincere usage. The model learns the pattern but not the base rate.
This is a specific instance of a broader calibration problem. Since Why do preference models favor surface features over substance?, we know that training data artifacts systematically distort model judgments across multiple dimensions. Irony overestimation is the pragmatic version: the model's sense of "how often is this ironic?" is calibrated to training data saliency, not to real-world frequency.
The implication for literary analysis is significant. Literary irony is subtle, context-dependent, and often operates through understatement — exactly the opposite of the salient, explicitly marked irony that dominates training data. A model that over-reads ironic intent will find irony where an author intended none, and may miss genuine irony that operates through restraint rather than exaggeration. Since Can language models adapt implicature to conversational context?, the failure to calibrate irony to context is part of a larger pattern: LLMs apply fixed pragmatic templates where communicative context should modulate interpretation.
Source: inbox/research-brief-llm-literary-analysis-2026-03-02.md
Related concepts in this collection
-
Why do preference models favor surface features over substance?
Preference models show systematic bias toward length, structure, jargon, sycophancy, and vagueness—features humans actively dislike. Understanding this 40% divergence reveals whether it stems from training data artifacts or architectural constraints.
calibration bias from training data saliency
-
Can language models adapt implicature to conversational context?
Do large language models flexibly modulate scalar implicatures based on information structure, face-threatening situations, and explicit instructions—as humans do? This tests whether pragmatic computation is truly context-sensitive or merely literal.
fixed pragmatic templates where context should modulate
-
Why do speakers deliberately use ambiguous language?
Explores whether ambiguity is a linguistic defect or a strategic tool speakers use for efficiency, politeness, and deniability. Matters because it challenges how we train language systems.
irony operates through productive ambiguity between literal and intended meaning
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
LLM irony detection systematically overestimates ironic intent — calibration bias reveals pattern recognition without pragmatic understanding