Language Understanding and Pragmatics

Why do readers interpret the same sentence so differently?

How much of annotation disagreement in NLP reflects genuine interpretive multiplicity rather than error? This explores whether social position and moral framing systematically generate competing but equally valid readings.

Note · 2026-02-21 · sourced from Linguistics, NLP, NLU
Where exactly does language competence break down in LLMs? How should researchers navigate LLM reasoning research?

The standard assumption underlying NLP benchmark design is that sentences have one correct interpretation. Disagreement between annotators signals annotation failure. The solution is to filter or adjudicate until one answer emerges.

Interpretation Modeling (IM, Cercas Curry et al. 2023) challenges this assumption directly. The study models multiple interpretations of socially embedded sentences, guided by reader attitudes toward the author and reader understanding of implicit moral judgments. Finding: conflicting interpretations are socially plausible. They reflect different social positions and moral framings, not annotation error.

This is not about ambiguous sentences in the traditional sense (lexical or syntactic ambiguity) but about the social and implicit dimensions of meaning in natural communication. A sentence embedded in a social context carries different meanings for readers with different:

The interpretations that result are not all "correct" in a truth-conditional sense, but they are all "valid" in a socially and pragmatically grounded sense — readers with different social positions genuinely understand different things from the same text.

The implication is uncomfortable for NLP: the gold standard that benchmarks aspire to may not exist for a substantial portion of natural language. Treating disagreement as noise produces evaluation systems that measure agreement on easy cases while missing the hard question of how interpretation actually works.

The NLI disagreement literature provides statistical confirmation. "Lost in Inference" (analyzing NLI annotation disagreement across major benchmarks) finds that NLI task performance is not saturated — humans continue to disagree, and that disagreement is not random noise but structured. Human annotation distributions on contested examples carry information that the majority label discards. This is the empirical grounding for IM's theoretical claim: interpretation is irreducibly multiple, and the distribution over interpretations is itself meaningful data.

An additional mechanism: social identity projection. Readers don't just apply their moral frameworks abstractly — they project the likely social identity of the author based on textual cues, then interpret the content through the lens of that projected identity. Two readers who project different author identities from the same text will read the same words as carrying different social stances. This is a grounding claim about interpretation that goes beyond semantic ambiguity.

This connects to Why do speakers deliberately use ambiguous language? — interpretive multiplicity is not a failure of specification but a feature of how socially embedded language operates. Since Do standard NLP benchmarks hide LLM ambiguity failures?, this irreducibility is doubly hidden.


Source: Linguistics, NLP, NLU

Related concepts in this collection

Concept map
15 direct connections · 138 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

sentence interpretations are irreducibly multiple because social position and moral framing generate competing readings