Therefore, we implemented a Transformer-based encoder-decoder neural system which is able to use the output of a pre-trained language model as input embeddings, and we compared its performance under three configurations: 1) without any pre-trained language model (constrained), 2) using a language model trained on the monolingual parts of the allowed English-Czech data (constrained), and 3) using a language model trained on a large quantity of external monolingual data (unconstrained).
In this article, we tackle the issue of the limited quantity of manually sense annotated corpora for the task of word sense disambiguation, by exploiting the semantic relationships between senses such as synonymy, hypernymy and hyponymy, in order to compress the sense vocabulary of Princeton WordNet, and thus reduce the number of different sense tags that must be observed to disambiguate all words of the lexical database.
Ranked #1 on Word Sense Disambiguation on SemEval 2007 Task 7
Our method leads to state of the art results on most WSD evaluation tasks, while improving the coverage of supervised systems, reducing the training time and the size of the models, without additional training data.
Ranked #2 on Word Sense Disambiguation on SemEval 2007 Task 7
We find that CSA, GA and SA all eventually converge to similar results (0. 98 F1 score), but CSA gets there faster (in fewer scorer calls) and reaches up to 0. 95 F1 before SA in fewer scorer calls.