On the Relationship between Sentence Analogy Identification and Sentence Structure Encoding in Large Language Models

Paper · Source

The ability of Large Language Models (LLMs) to encode syntactic and semantic structures of language is well examined in NLP. Additionally, analogy identification, in the form of word analogies are extensively studied in the last decade of language modeling literature. In this work we specifically look at how LLMs’ abilities to capture sentence analogies (sentences that convey analogous meaning to each other) vary with LLMs’ abilities to encode syntactic and semantic structures of sentences. Through our analysis, we find that LLMs’ ability to identify sentence analogies is positively correlated with their ability to encode syntactic and semantic structures of sentences. Specifically, we find that the LLMs which capture syntactic structures better, also have higher abilities in identifying sentence analogies.

details on the specific LLMs. Wijesiriwardene et al. (2023) introduced a taxonomy of analogies starting from less complex wordlevel analogies to more complex paragraph-level analogies and evaluated how each LLM performs on identifying analogies at each level of the taxonomy. An analogy is a pair of lexical items that are identified to hold a similar meaning to each other. Therefore the distance between a pair of analogous lexical items in the vector space should be smaller. The same authors identify Mahalanobis Distance (MD) (Mahalanobis, 1936) to be a better measurement of the distance between two analogous sentences in the vector space. Therefore in this work, the ability of each LLM to identify sentence analogies is represented by the mean MD calculated for the sentence-level datasets (levels 3, 4 and 5) present in the analogy taxonomy

3.3 Large Language Models and their Ability to Capture Sentence Structures Hewitt and Manning (2019) introduced a probing approach to evaluate whether syntax trees (sentence structures) are encoded in Language Models’ (LMs’) vector geometry. The probing model is trained on train/dev/test splits of the Penn Treebank (Marcus et al., 1993) and tested on both BERT (Devlin et al., 2018) and ELMo (Peters et al., 2018). An LM’s ability to capture sentence structure is quantified by its ability to correctly encode the gold parse tree (provided in the Penn Treebank dataset) within its embeddings for a given sentence. The authors introduce a path distance metric and a path depth metric for evaluation. The distance metric captures the path length between each pair of words measured by Undirected Unlabeled Attachment Score (UUAS) and average Spearman correlation of true to predicted distances (DSpr). The depth metric evaluates the model’s ability to identify a sentence’s root, measured as root accuracy percentage. Additionally, the depth metric also evaluates the ability of the model to recreate the word order based on their depth in the parse tree identified as Norm Spearman (NSpr.)3 We refer the readers to Hewitt and Manning (2019) for further details on the technique and evaluation metrics.

Particularly, LLMs that better capture syntactic structures have a higher correlation to analogy identification. In summary this work explores how LLMS utilize the knowledge of semantic and syntactic structures of sentences to identify analogies.