Why do language models fail at communicative optimization?
LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?
"Do Large Language Models Resemble Humans in Language Use?" (Yiu et al. 2023) evaluates LLMs on a wide range of human linguistic regularities — not just grammaticality but psycholinguistic phenomena. The results show a consistent pattern of success and failure that tracks a specific distinction.
LLMs succeed on:
- Sound symbolism (maluma/takete roundness-spikiness associations)
- Sound-gender associations
- Structural priming (using the same syntactic structure as prime)
- Semantic priming (accessing recently primed word meanings)
- Dialect sensitivity (accessing dialect-appropriate vocabulary based on stated interlocutor identity)
These regularities are learnable from distributional patterns in text — they appear consistently across large corpora and can be acquired through form-to-form prediction.
LLMs fail on:
- Word length economy (choosing shorter forms when context is predictive — the Zipfian efficiency principle)
- Syntactic ambiguity resolution (selecting the contextually appropriate reading of ambiguous syntax)
- Semantic illusions (detecting incongruent words in otherwise coherent sentences)
- Drawing discourse inferences (bridging two pieces of information)
These regularities require something beyond distributional pattern matching. They involve principles of why language works for communication — efficiency under communicative pressure, contextual interpretation that goes beyond local statistics, integration across discourse.
The discriminating principle: statistical regularities that appear as consistent patterns in training data transfer. Regularities that emerge from communicative optimization — the pragmatic logic of why language has the forms it does — do not transfer, because they are not present in surface form as trainable signals.
Source: Linguistics, NLP, NLU
Related concepts in this collection
-
Can models pass tests while missing the actual grammar?
Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
this paper provides the specific taxonomy of which generalizations do and don't transfer
-
Why does ChatGPT fail at implicit discourse relations?
ChatGPT excels when discourse connectives are present but drops to 24% accuracy without them. What does this gap reveal about how LLMs actually process meaning and logical relationships?
same pattern: surface-explicit success, contextual-implicit failure
-
Why do LLMs handle causal reasoning better than temporal reasoning?
Exploring whether language models perform asymmetrically on different discourse relations and what training data patterns might explain the gap between causal and temporal reasoning abilities.
parallel: what's in the training distribution determines what's learned
-
Why do speakers deliberately use ambiguous language?
Explores whether ambiguity is a linguistic defect or a strategic tool speakers use for efficiency, politeness, and deniability. Matters because it challenges how we train language systems.
ambiguity management is one of the failed communicative optimization principles
-
Why don't LLMs shorten messages like humans do?
Humans naturally develop shorter, efficient language during conversations. Do multimodal LLMs exhibit this same spontaneous adaptation, or do they lack this communicative behavior?
ICCA finding: convention formation (becoming more efficient through interaction) is precisely a communicative optimization principle that LLMs fail to acquire; they understand efficient language but don't produce it
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
llms replicate local statistical regularities in language but fail to acquire communicative optimization principles