Can language models learn meaning from text patterns alone?
Explores whether training on form alone—predicting the next word from prior words—could ever give language models access to communicative intent and genuine semantic understanding.
Bender & Koller (2020) make a specific structural argument, not just an intuitive one. Meaning is defined as the relation M ⊆ E × I — pairs of natural language expressions and the communicative intents they can be used to evoke. Understanding language means retrieving i given e. But communicative intents are about something outside of language. Form alone — marks on a page, pixels, bytes — is insufficient.
The reasoning: without access to a mechanism for hypothesizing and testing underlying communicative intents, reconstructing them from form alone is impossible. Language modeling predicts the next token given prior tokens — purely a form-to-form operation. The training signal provides no information about what intents the forms were used to evoke.
Human language acquisition illustrates the point by contrast. What is critical for meaning acquisition is not just interaction but joint attention — situations where child and caregiver both attend to the same thing and are both aware of this fact. Learning meaning requires the ability to be aware of what another person is attending to and guess what they are intending to communicate. Intersubjectivity is not incidental to language learning; it is its mechanism.
The Harnad formulation (symbol grounding problem): a non-speaker of Chinese cannot learn the meanings of Chinese words from Chinese dictionary definitions alone. You need something outside the symbol system to anchor the symbols. Form-to-form prediction cannot provide this anchor.
Mutual understanding is structurally unavailable — even in conversational media. The form-only training constraint has a downstream consequence that applies even when AI operates in conversational channels: seeking mutual understanding with the user is structurally unavailable to an LLM because mutual understanding requires the intersubjectivity that form-training cannot provide. The communication is one-way even when it occurs on a medium designed for mediated social interaction. This reframes AI social-media posts as a specific genre: indirect discourse that is a form of writing even when it appears in an interactive environment. The user reads the post, the medium formally supports reply, but the AI is not available for the second turn that would close a loop of mutual understanding — and was never going to be. The channel looks communicative; the content is monological writing that happens to be deposited in a conversational shape.
This is distinct from the claim that LLMs "have no understanding." It is the more precise claim that the training mechanism — string prediction — is in principle incapable of providing the signal that meaning acquisition requires, regardless of scale.
Source: Linguistics, NLP, NLU
Related concepts in this collection
-
Do LLMs develop the same kind of mind as humans?
Explores whether LLMs and humans share the intersubjective linguistic training that shapes cognition, and whether that shared training produces equivalent forms of agency and reflexivity.
Habermas framing of the same gap from different angle: shared substrate, absent participatory mechanism
-
What makes linguistic agency impossible for language models?
From an enactive perspective, does linguistic agency require embodied participation and real stakes that LLMs fundamentally lack? This matters because it challenges whether LLMs can truly engage in language or only generate text.
enactive cognitive science version of the same absence
-
Can models pass tests while missing the actual grammar?
Do language models succeed on grammatical benchmarks by learning surface patterns rather than structural rules? This matters because correct outputs may hide reliance on shallow heuristics that fail on novel structures.
what is learned from form alone: surface regularities, not structural competence
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
language models trained on form alone cannot acquire meaning because meaning requires joint attention and intersubjectivity