Why do language models fail at communicative optimization?
LLMs excel at learning surface statistical patterns from text but struggle with deeper principles of how language achieves efficient communication. What distinguishes these two types of linguistic knowledge?