A Survey on Lexical Ambiguity Detection and Word Sense Disambiguation

Paper · arXiv 2403.16129 · Published March 24, 2024

This paper explores techniques that focus on understanding and resolving ambiguity in language within the field of natural language processing (NLP), highlighting the complexity of linguistic phenomena such as polysemy and homonymy and their implications for computational models. Focusing extensively on Word Sense Disambiguation (WSD), it outlines diverse approaches ranging from deep learning techniques to leveraging lexical resources and knowledge graphs like WordNet. The paper introduces cutting-edge methodologies like word sense extension (WSE) and neuromyotonic approaches, enhancing disambiguation accuracy by predicting new word senses. It examines specific applications in biomedical disambiguation and languagespecific optimisation and discusses the significance of cognitive metaphors in discourse analysis. The research identifies persistent challenges in the field, such as the scarcity of sense-annotated corpora and the complexity of informal clinical texts. It concludes by suggesting future directions, including using large language models, visual WSD, and multilingual WSD systems, emphasizing

B. Neurosymbolic Methodology for WSD Systems

Another breakthrough in WSD is achieved through a novel neurosymbolic methodology [14], surpassing the 80% F1 score ceiling and reaching 90%. This methodology integrates supervised learning with symbolic reasoning, utilizing hypernym relations and pre-trained word embeddings [15]. Such neurosymbolic approaches have opened new vistas for full-fledged multilingual WSD systems [15]. A recent study in Scientific Reports introduces a novel technique for creating word sense embeddings [16], especially for words that are polysemous. This method employs a bidirectional long short-term memory neural network and a self-attention mechanism. [16], this method enhances the accuracy of word sense induction and the quality of sense embeddings [16]. This advancement significantly contributes to the representation of word senses in computational linguistics. These developments reflect the dynamic nature of research in lexical ambiguity detection and WSD. Incorporating novel methodologies like WSE and neurosymbolic approaches, along with advancements in word sense embeddings, highlights the ongoing evolution and increasing sophistication in handling the complexities of language in NLP.

One notable example is the study by [17], which introduces a technique for WSD that involves tailoring contextual embeddings specifically for this task, relying exclusively on lexical information. The central concept of this method involves narrowing the semantic gap between related senses and contexts while distancing dissimilar or unrelated ones. This strategy has demonstrated superior results over past methods that modify contextual embeddings.