Language Understanding and Pragmatics

Can language models adapt implicature to conversational context?

Do large language models flexibly modulate scalar implicatures based on information structure, face-threatening situations, and explicit instructions—as humans do? This tests whether pragmatic computation is truly context-sensitive or merely literal.

Note · 2026-02-21 · sourced from Linguistics, NLP, NLU
Where exactly does language competence break down in LLMs? How should researchers navigate LLM reasoning research?

Scalar implicatures are a core pragmatic phenomenon: when someone says "some," it typically implies "not all." This is not semantically entailed but pragmatically inferred based on the maxim of quantity — if all were true, the speaker would have said "all." Human computation of these implicatures is sensitive to communicative context in documented ways.

Three experiments from Pragmatic Implicature Processing in ChatGPT (Ruytenbeek et al. 2024) tested whether ChatGPT shows human-like context-sensitivity in implicature. All three failed:

Generalized conversational implicatures: Humans can inhibit implicature computation when explicitly instructed to interpret utterances literally. ChatGPT failed to show this distinction — it doesn't switch between pragmatic and semantic processing modes.

Information structure sensitivity: For scalar implicatures, humans compute more "some but not all" inferences when the scalar term is in the information focus (the direct answer to an explicit question) than when it is in the background. ChatGPT showed no sensitivity to information structure.

Face context: Human scalar implicature rates differ between face-threatening and face-boosting contexts. If a poem is being evaluated and someone says "some people loved it," the implicature ("not all loved it") is more prominent in face-boosting contexts. ChatGPT showed no differential response to face context.

These are not exotic phenomena. They are the basic flexibility that allows human conversation to be more than literal string exchange. Pragmatic competence requires tracking the communicative context — who is asking, why, what stakes are involved — and modulating interpretation accordingly. ChatGPT's failure is not isolated to edge cases; it extends to routine context-modulation effects that appear in any human conversation.

A complementary finding in non-literal language: GPT-4o significantly overestimates irony likelihood in emojis compared to human perception (median irony scores significantly higher, W = 918.5, p < .001). When prompted to rate the likelihood of specific emojis being used ironically, GPT-4o considers the same emojis more likely to express irony than humans do — possibly due to disproportionate representation of ironic emoji usage in training data. Demographic information in prompts does not substantially change GPT-4o's irony classification. This parallels the implicature failure: the model cannot calibrate to actual human pragmatic norms for non-literal communication, whether the signal is scalar implicature or visual irony.


Source: Linguistics, NLP, NLU

Related concepts in this collection

Concept map
15 direct connections · 145 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

llm scalar implicature computation fails to adapt to communicative context