Why don't LLMs shorten messages like humans do?
Humans naturally develop shorter, efficient language during conversations. Do multimodal LLMs exhibit this same spontaneous adaptation, or do they lack this communicative behavior?
Humans spontaneously develop increasingly efficient language during interactions. A patient might start with "the medicine for my back pain in a small blue medicine bottle" and within a week say just "my back meds." This is lexical convention formation — a fundamental property of human communication documented extensively in reference game studies.
The ICCA framework evaluates whether multimodal LLMs exhibit this behavior. The results reveal an asymmetry: as listeners, models like GPT-4 show adaptation trends close to humans, improving accuracy as interactions progress. But as speakers, all models fail to spontaneously improve communication efficiency. Only with fairly heavy-handed instruction — explicitly telling the model to reduce message length and maintain lexical consistency — do GPT-4, Gemini, and Claude show partial adaptation.
Four prompting variants reveal the gradient:
- Standard: no mention of efficiency → no adaptation
- Gricean instruction: Grice's quantity maxim ("don't be more informative than necessary") → minimal effect
- Explicit instruction: "gradually condense your messages" → some length reduction
- Explicit + consistency: "extract salient tokens from previous messages" → closest to human behavior
The Word Novelty Rate (WNR) metric captures this precisely — counting word insertions and substitutions while ignoring deletions, since deletions reflect natural convention formation while additions indicate cognitive-load-increasing changes.
This finding is a concrete instantiation of a broader pattern. Since Why do language models fail at communicative optimization?, communicative efficiency through convention formation is precisely the optimization principle that training on form alone cannot produce. Convention formation requires modeling the listener's cognitive state and adjusting accordingly — a functional competence that next-token prediction does not select for.
A training-time solution has now been demonstrated. Since Can we teach LLMs to form linguistic conventions in context?, the convention formation gap is addressable through targeted post-training rather than architectural redesign. The approach — heuristically extracting coreference chains from TV scripts, constructing DPO preference pairs, and adding a [remention] planning token — produces general in-context convention formation behavior. The model spontaneously shortens references as interaction progresses, precisely the capability the ICCA framework found missing.
Source: Conversation Topics Dialog, Conversation Architecture Structure
Related concepts in this collection
-
Why do language models fail at communicative optimization?
LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?
convention formation is a communicative optimization principle that statistical learning misses
-
Do language models actually build shared understanding in conversation?
When LLMs respond fluently to prompts, do they perform the communicative work humans do to establish mutual understanding? Research suggests they skip the grounding acts that make dialogue reliable.
convention formation IS the process of building common ground in action; models that don't adapt don't build
-
Are language models developing real functional competence or just formal competence?
Neuroscience suggests formal linguistic competence (rules and patterns) and functional competence (real-world understanding) rely on different brain mechanisms. Can next-token prediction alone produce both, or does it leave functional competence behind?
convention formation requires functional competence
-
Can we teach LLMs to form linguistic conventions in context?
Humans naturally shorten references as conversations progress, but LLMs don't adapt their language for efficiency even when they understand their partners do. Can training on coreference patterns teach this convention-forming behavior?
the training-time solution to the convention formation gap
-
Why don't conversational AI systems mirror their users' word choices?
Explores whether current dialogue models exhibit lexical entrainment—the human tendency to align vocabulary with conversation partners—and what's needed to bridge this gap in AI communication.
sibling capability gap: convention formation (becoming more efficient) and lexical entrainment (adopting partner's vocabulary) are two manifestations of the same absent capacity for interaction-adaptive language production
-
Can communication pressure drive agents to learn shared abstractions?
Under what conditions do AI agents develop compact, efficient shared languages? This explores whether cooperative task pressure—rather than explicit optimization—naturally drives abstraction formation, mirroring human collaborative communication.
the missing ingredient: ACE shows that cooperative communication pressure drives efficiency adaptation, which is exactly what the ICCA framework found absent in LLMs; LLMs lack the communicative pressure that would drive convention formation
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
multimodal LLMs do not spontaneously adapt their language for communication efficiency despite understanding their interlocutors increasingly efficient language