Word sense induction (WSI) is widely known as the “unsupervised version” of WSD. The problem states as: Given a target word (e.g., “cold”) and a collection of sentences (e.g., “I caught a cold”, “The weather is cold”) that use the word, cluster the sentences according to their different senses/meanings. We do not need to know the sense/meaning of each cluster, but sentences inside a cluster should have used the target words with the same sense.
Description from NLP Progress
The key idea is to utilize word sememes to capture exact meanings of a word within specific contexts accurately.
COMMON SENSE REASONING LANGUAGE MODELLING MACHINE TRANSLATION SENTIMENT ANALYSIS WORD EMBEDDINGS WORD SENSE INDUCTION
Recently proposed Skip-gram model is a powerful method for learning high-dimensional word representations that capture rich semantic relationships between words.
Word sense induction (WSI) is the task of unsupervised clustering of word usages within a sentence to distinguish senses.
Ranked #1 on
Word Sense Induction
on SemEval 2010 WSI
This paper presents a new graph-based approach that induces synsets using synonymy dictionaries and word embeddings.
An established method for Word Sense Induction (WSI) uses a language model to predict probable substitutes for target words, and induces senses by clustering these resulting substitute vectors.
Ranked #3 on
Word Sense Induction
on SemEval 2013
Evaluating these methods is also problematic, as rigorous quantitative evaluations in this space is limited, especially when compared with single-sense embeddings.
Thus, we aim to eliminate these requirements and solve the sense granularity problem by proposing AutoSense, a latent variable model based on two observations: (1) senses are represented as a distribution over topics, and (2) senses generate pairings between the target word and its neighboring word.
Ranked #2 on
Word Sense Induction
on SemEval 2010 WSI
The paper reports our participation in the shared task on word sense induction and disambiguation for the Russian language (RUSSE-2018).