Do transformer static embeddings actually encode semantic meaning?
Explores whether the fixed word embeddings that enter transformer networks contain rich semantic information or serve only as shallow placeholders. This addresses a longstanding debate in philosophy of language about whether word meanings are stored or constructed.
The transformer architecture creates two distinct representations for every word: a static token embedding (input to self-attention) and a contextualized embedding (output of self-attention). The static embedding is the invariant entry for each word in the model's vocabulary. The question is whether these static embeddings carry meaningful semantic information or are mere placeholders that get enriched only during self-attention.
The "meaning eliminativist" hypothesis — defended in psycholinguistics by Elman (2004) and philosophy by Rayo (2013) and Recanati (2003) — holds that static word meanings are redundant. Applied to LLMs, this would mean static embeddings store only morphological and syntactic cues, with semantic information introduced entirely at the self-attention layer. Given that embeddings have only 768 parameters per token in RoBERTa-base versus tens of millions in the attention and feed-forward layers, there is architectural reason to expect semantic information might be deferred.
The evidence rules this out. Clustering RoBERTa-base's ~50,000 token embeddings into 200 clusters reveals sensitivity to five psycholinguistic measures:
- Valence — pleasantness of the concept (from the Mehrabian three-dimensional emotion model)
- Concreteness — perceptible entity vs. abstract notion ("bicycle" = 4.89, "justice" = low)
- Iconicity — perceived resemblance between form and meaning (challenging the arbitrariness-of-the-sign thesis)
- Taboo — social transgression load of the term
- Age of acquisition — when the word is typically learned
The iconicity finding is particularly striking because detecting it requires access to surface properties, semantic properties, and recognition of resemblance between them — all within the static embedding before any attention mechanism operates.
This means LLMs implement something analogous to a lexical store: each word has an entry containing genuine semantic information that is then modulated by context during self-attention. The parallel to the philosophy-of-language debate is direct: static embeddings are rich entries that get contextually adjusted, not minimal cores that get built from scratch each time.
The implication for mechanistic interpretability: semantic information is distributed across two levels — the token embedding layer and the contextualized layers — and analysis that focuses only on intermediate or final representations may miss what was already encoded at input.
Related concepts in this collection
-
Does semantic grounding in language models come in degrees?
Rather than asking whether LLMs truly understand meaning, this explores whether grounding is actually a multi-dimensional spectrum. The question matters because it reframes the sterile understand/don't-understand debate into measurable, distinct capacities.
the tri-partite taxonomy operates at the contextualized level; static embeddings provide the base material that functional grounding then operates on
-
Are language models developing real functional competence or just formal competence?
Neuroscience suggests formal linguistic competence (rules and patterns) and functional competence (real-world understanding) rely on different brain mechanisms. Can next-token prediction alone produce both, or does it leave functional competence behind?
the static/contextualized split in transformers may parallel the formal/functional competence distinction: formal competence in the embeddings, functional competence requiring attention
-
Why does reasoning training help math but hurt medical tasks?
Explores whether reasoning and knowledge rely on different network mechanisms, and why training one might undermine the other across different domains.
extends: semantic knowledge begins even before the first layer, in the embedding matrix itself
-
How do language models encode syntactic relations geometrically?
Do LLM embeddings use distance alone or also direction to represent syntax? Understanding whether neural networks can spontaneously develop symbolic-compatible geometric structures.
complementary layered discovery: static embeddings encode semantic features at the embedding layer while Polar Probe reveals syntactic structure across transformer layers — semantic base with syntactic superstructure
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
transformer static embeddings encode rich semantic information including valence concreteness iconicity and taboo — ruling out meaning eliminativism