Language Understanding and Pragmatics LLM Reasoning and Architecture

Why do language models fail at communicative optimization?

LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?

Note · 2026-02-21 · sourced from Linguistics, NLP, NLU

"Do Large Language Models Resemble Humans in Language Use?" (Yiu et al. 2023) evaluates LLMs on a wide range of human linguistic regularities — not just grammaticality but psycholinguistic phenomena. The results show a consistent pattern of success and failure that tracks a specific distinction.

LLMs succeed on:

Sound symbolism (maluma/takete roundness-spikiness associations)
Sound-gender associations
Structural priming (using the same syntactic structure as prime)
Semantic priming (accessing recently primed word meanings)
Dialect sensitivity (accessing dialect-appropriate vocabulary based on stated interlocutor identity)

These regularities are learnable from distributional patterns in text — they appear consistently across large corpora and can be acquired through form-to-form prediction.

LLMs fail on:

Word length economy (choosing shorter forms when context is predictive — the Zipfian efficiency principle)
Syntactic ambiguity resolution (selecting the contextually appropriate reading of ambiguous syntax)
Semantic illusions (detecting incongruent words in otherwise coherent sentences)
Drawing discourse inferences (bridging two pieces of information)

These regularities require something beyond distributional pattern matching. They involve principles of why language works for communication — efficiency under communicative pressure, contextual interpretation that goes beyond local statistics, integration across discourse.

The discriminating principle: statistical regularities that appear as consistent patterns in training data transfer. Regularities that emerge from communicative optimization — the pragmatic logic of why language has the forms it does — do not transfer, because they are not present in surface form as trainable signals.

Source: Linguistics, NLP, NLU

Related concepts in this collection

Can models pass tests while missing the actual grammar? Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
this paper provides the specific taxonomy of which generalizations do and don't transfer
Why does ChatGPT fail at implicit discourse relations? ChatGPT excels when discourse connectives are present but drops to 24% accuracy without them. What does this gap reveal about how LLMs actually process meaning and logical relationships?
same pattern: surface-explicit success, contextual-implicit failure
Why do LLMs handle causal reasoning better than temporal reasoning? Exploring whether language models perform asymmetrically on different discourse relations and what training data patterns might explain the gap between causal and temporal reasoning abilities.
parallel: what's in the training distribution determines what's learned
Why do speakers deliberately use ambiguous language? Explores whether ambiguity is a linguistic defect or a strategic tool speakers use for efficiency, politeness, and deniability. Matters because it challenges how we train language systems.
ambiguity management is one of the failed communicative optimization principles
Why don't LLMs shorten messages like humans do? Humans naturally develop shorter, efficient language during conversations. Do multimodal LLMs exhibit this same spontaneous adaptation, or do they lack this communicative behavior?
ICCA finding: convention formation (becoming more efficient through interaction) is precisely a communicative optimization principle that LLMs fail to acquire; they understand efficient language but don't produce it

Concept map

18 direct connections · 127 in 2-hop network ·medium cluster

Why do language models fail at communicative opt… Can models pass tests while missing the actual gra… Why does ChatGPT fail at implicit discourse relati… Why do LLMs handle causal reasoning better than te… Why do speakers deliberately use ambiguous languag… Why don't LLMs shorten messages like humans do?

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

llms replicate local statistical regularities in language but fail to acquire communicative optimization principles