Def2Vec: Extensible Word Embeddings from Dictionary Definitions

ICNLSP 2023  ยท  Irene Morazzoni, Vincenzo Scotti, Roberto Tedesco ยท

Def2Vec introduces a novel paradigm for word embeddings, leveraging dictionary definitions to learn semantic representations. By constructing term-document matrices from definitions and applying Latent Semantic Analysis (LSA), Def2Vec generates embeddings that offer both strong performance and extensibility. In evaluations encompassing Part-of-Speech tagging, Named Entity Recognition, chunking, and semantic similarity, Def2Vec often matches or surpasses state-of-the-art models like Word2Vec, GloVe, and fastText. Our modelโ€™s second factorised matrix resulting from LSA enables efficient embedding extension for out-of-vocabulary words. By effectively reconciling the advantages of dictionary definitions with LSA-based embeddings, Def2Vec yields informative semantic representations, especially considering its reduced data requirements. This paper advances the understanding of word embedding generation by incorporating structured lexical information and efficient embedding extension.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Chunking CoNLL 2003 Def2Vec Accuracy 77.69 # 1
F1 81.45 # 1
Precision 86.56 # 1
Recall 77.69 # 1
AUC 93.07 # 1
NER CoNLL 2003 Def2Vec Accuracy 71.98 # 1
F1 83.09 # 1
Precision 99.28 # 1
Recall 71.98 # 1
AUC 96.28 # 1
POS CoNLL 2003 Def2Vec Accuracy 72.42 # 1
F1 76.55 # 1
Precision 85.41 # 1
Recall 72.42 # 1
AUC 94.63 # 1
Semantic Similarity STS Benchmark Def2Vec Spearman Correlation 63.72 # 1

Methods


fastText โ€ข GloVe