Semantic Specialization for Knowledge-based Word Sense Disambiguation

Paper · arXiv 2304.11340 · Published April 22, 2023

A promising approach for knowledge-based Word Sense Disambiguation (WSD) is to select the sense whose contextualized embeddings computed for its definition sentence are closest to those computed for a target word in a given sentence. This approach relies on the similarity of the sense and context embeddings computed by a pre-trained language model. We propose a semantic specialization for WSD where contextualized embeddings are adapted to the WSD task using solely lexical knowledge. The key idea is, for a given sense, to bring semantically related senses and contexts closer and send different/unrelated senses farther away. We realize this idea as the joint optimization of the Attract-Repel objective for sense pairs and the self-training objective for context-sense pairs while controlling deviations from the original embeddings. The proposed method outperformed previous studies that adapt contextualized embeddings. It achieved state-of-the-art performance on knowledge-based WSD when combined with the reranking heuristic that uses the sense inventory.

the goal of this study is knowledge-based WSD: a variant of WSD that does not rely on supervision data but only on lexical knowledge (e.g., word ontology).

select the sense that is the nearest to a target word in the embedding space (Wang and Wang, 2020). Specifically, a pre-trained language model, typically BERT (Devlin et al., 2019), is used to compute sense embeddings for definition sentences. Similarly, a target word is encoded into a context embedding for a given sentence. Then, the model predicts the sense of the target word by finding the most similar sense embedding to the context

Our key idea is to 1) bring semantically related sense and context embeddings that convey the same meaning closer, and 2) send unrelated and/or different senses that share the same surface form farther away (Fig. 1-d). We formulate the idea as the Attract-Repel objective and self training objective. The main novelty is the joint optimization to utilize their complementary nature: the former should improve the distinguishability between senses whereas the latter offers pseudo signals of context-sense associations, which has not been explored in previous methods.

Attract-Repel objective, inspired by Vulic and Mrksic (2018), injects semantic relation knowledge into the similarity of sense pairs

excessive deviation may cause an inaccurate nearest neighbor sense selection, which would cause a performance drop