Word Meanings in Transformer Language Models
We investigate how word meanings are represented in the transformer language models. Specifically, we focus on whether transformer models employ something analogous to a lexical store - where each word has an entry that contains semantic information. To do this, we extracted the token embedding space of RoBERTa-base and k-means clustered it into 200 clusters. In our first study, we then manually inspected the resultant clusters to consider whether they are sensitive to semantic information. In our second study, we tested whether the clusters are sensitive to five psycholinguistic measures: valence, concreteness, iconicity, taboo, and age of acquisition. Overall, our findings were very positive - there is a wide variety of semantic information encoded within the token embedding space. This serves to rule out certain ”meaning eliminativist” hypotheses about how transformer LLMs process semantic information.
Do large language models (LLMs) understand the meanings of the words that they use? When we apply terms like “understand” - terms that are typically applied in the human case - to an artificial system, we inevitably enter into a debate with an anthropomorphised framing. There, the possibility of a skeptical answer looms because LLMs fail to possess some feature that seems important in the human case. An alternative approach is to stipulate that LLMs understand their words in some sense and then ask what that understanding consists of. We can call it “understanding*” or “AI-understanding” if we like, but for the purposes of this paper we will stick with the original term. Even if attributing this kind of understanding to LLMs does not equate to attributing human understanding, it may be that an investigation into the way LLMs understand the words they use could still prove useful in the human case. One way in which this could occur is that LLM could help answer ‘how possibly’ questions about possible ways in which linguistic information can be processed, even if it is not the way linguistic information is processed in the human case (Grindrod, forthcoming).
In this paper, we focus specifically on how semantic lexical information is stored and employed within a particular kind of large language model architecture - the transformer architecture (Vaswani et al., 2017). The transformer architecture is one of the reasons for the remarkable progress seen in language model technology and is still the basis for the current state-of-the-art. One of the fascinating aspects about the transformer architecture is that the selfattention mechanism at its heart gives rise to two distinct representations for any given word it processes. On the one hand, there is the “token embedding” or “static embedding” that is invariantly assigned to each word in the LLM’s dictionary and that (once combined with a positional embedding through vector addition) serves as input to the self-attention mechanism. On the other hand, there is the “contextualised embedding” that is the output of the self-attention procedure and that serves as a representation of the word as it was used in the input text. One of the key successes of the transformer architecture is the ability to represent a word as it is used in a particular context, and the contextualised embedding plays this role.
A long-standing debate in philosophy of language concerns the extent to which the meaning of a word as used on a particular occasion is determined by its invariant word meaning on the one hand and the context in which it is used on the other (Borg, 2004; Cappelen & Lepore, 2005; Travis, 1997; Wittgenstein, 1953). The distinction between static and contextualized embeddings within LLMs leads to an analogous question: to what extent is a word’s contextualised embedding determined by the word’s static embedding? One possibility is that the static embeddings are rich with semantic information and that much of this is retained in the contextualized embeddings. Another possibility is that as far as semantic properties go, the static embeddings merely serve as placeholders, with semantic information being introduced somewhere within the self-attention mechanism.
We approach this question through an empirical investigation of the information stored within the static embeddings. We extract the static embeddings from RoBERTa-base, an open-source model available through Hugging Face’s transformers package (Liu et al., 2019). We perform a cluster analysis on the static embeddings, and then manually inspect the clusters to investigate the static embedding space. We then test for whether the arrangement of the clusters is sensitive to a range of psycholinguistic measures, as a way of testing whether the static embedding space is sensitive to semantic information. Our findings show that the static embedding space is in fact rich with a range of semantic information. LLMs succeed in understanding via the use of a kind of lexical store, where semantic information is encoded for each word in their vocabulary.
The paper is structured as follows. In section 2, we provide a brief informal overview of the transformer architecture and of the role of static and contextualised embeddings. Then in section 3, we frame our investigation in terms of the radical contextualism debate within philosophy of language. Following (Grindrod, forthcoming), we show that our investigation will serve to test a position analogous to the “meaning eliminativism” previously proposed by the likes of Recanati (2003), Rayo (2013), and Elman (2004). We then present our cluster analysis in section 4, along with the findings of our first study, a manual inspection of the clusters. In section 5, we present the findings of our second study, where we test whether the clusters are sensitive to a range of psycholinguistic measures.
we focus here on the static embeddings that serve as input to the self-attention procedure.
As stated in the introduction, the distinction between the static embedding and the contextualised embedding maps fairly clearly onto an intuitive distinction between a word’s meaning and what a word means when used on a particular occasion. An initially intuitive view is that the relation between these two notions is or approaches identity, that what a word means on a particular occasion of use is just determined completely by its invariant meaning. But this has been challenged most notably by contextualists (Recanati, 2003; Travis, 1997). They argue that (nearly) all words vary in terms of what they contribute to a sentence meaning on a particular occasion of use, and that there are a wide, possibly open-ended, range of contextual factors that can determine this. Assuming that such variation in usage is right, it is then controversial what implications this has for word meaning. Some argue that word meanings are nevertheless rich in information, even if they are subsequently modulated when used. Others argue that word meanings must be some minimal core that is subsequently enriched on each occasion. Perhaps most radically, some have suggested that the notion of a static word meaning is redundant, that utterance meaning can be generated without some dedicated store of semantic information for each word. Recanati (2003) labels such a view “meaning eliminativism”, and it is a view that has arguably been defended in psycholinguistics by Elman (2004) and in philosophy by Rayo (2013). Within LLMs, there is a straightforward way in which the meaning eliminativist view could be realized (Grindrod, forthcoming). It may turn out that the static embeddings contain little in the way of semantic information, perhaps they serve as mere placeholders, or perhaps they only contain information about morphology and syntax. But what reason might there be for the LLM to take this approach? On the one hand, we should consider the wide array of information that any given word has associated with it, including morphological, phonological, syntactic, semantic, and pragmatic information. Given this, combined with the fact that the embedding space presumably has a limit on the amount of information it can store regarding each word, it may be that some semantic information, particularly information that is context-sensitive, is really introduced at the self-attention procedure. The relative size of the model speaks in favour of this point as well; as noted earlier, the embeddings for each word form a relatively small parts of the overall model; in RoBERTa-base they have only 768 parameters compared to the 10s of millions of parameters contained within the self-attention and feed-forward layers.
We can be more specific in our inquiry, however, and ask not only whether there is semantic information contained within the static embeddings but also what kind of semantic information.
The valence of a term is roughly understood as its pleasantness. The notion is derived from a three-dimensional model of emotional states developed by (Mehrabian, 1980; Osgood et al., 1957). According to this picture, along with valence, emotions can also vary according to arousal (how energetic and attentive an emotion feels) and dominance (how active or passive the emotion feels).
Concreteness stands for the extent to which a word refers to a perceptible entity rather than an abstract notion. So “bicycle” would have a high concreteness score (4.89) while “justice” would have a low concreteness score
A view once widely-held is that the relationship between a word’s iconographic and phonological features on the one hand, and its semantic features on the other, is arbitrary, bar a few exceptions of onomatopoeia (e.g. “boom”, “fizzle”). More recently, this position has been challenged by the idea that iconicity - a “perceived resemblance between aspects of [..] form and aspects of [...] meaning” (Winter et al., 2024, p. 1640) actually appears across a wide array of terms to varying extents. Iconicity is an intriguing property to consider with regard to LLMs because to detect iconicity, three things are obviously needed. First, you need access to a word’s surface properties (phonological, iconographic, etc.), second you need access to a word’s semantic properties, and third you need to recognize a resemblance between the two.
A type of meaning that has been of interest to philosophers of language in recent years is pejorative and slur meaning. As a form of meaning, it appears to behave uniquely insofar as the offensive content is still communicated even when such expressions are embedded in conditional sentences, speech act reports, and other sentential contexts.
For each attribute, we test whether each cluster is organized in a way that is sensitive to that attribute. More specifically, we investigate the probability of the distribution of that attribute across the cluster given the distribution of that attribute across the entire dataset.
The results from study 2 add further reason to think that the static embeddings encode a wide array of semantic information. It is notable, for instance, that we even found clusters to be sensitive to taboo, even though the tokens assigned a taboo score was relatively small
there is still the possibility that the clusters in question are not actually sensitive to the attribute, but are sensitive to some correlative property.
We have shown in this paper that the static embeddings that serve as input to the self-attention procedure do not merely store syntactic and surface-level information about words (and word parts) but also store meaningful semantic information. We have also seen some reason to think that worldly information is stored at this level as well