Sentence Selection Strategies for Distilling Word Embeddings from BERT

LREC 2022 · Yixiao Wang, Zied Bouraoui, Luis Espinosa Anke, Steven Schockaert ·

Many applications crucially rely on the availability of high-quality word vectors. To learn such representations, several strategies based on language models have been proposed in recent years. While effective, these methods typically rely on a large number of contextualised vectors for each word, which makes them impractical. In this paper, we investigate whether similar results can be obtained when only a few contextualised representations of each word can be used. To this end, we analyse a range of strategies for selecting the most informative sentences. Our results show that with a careful selection strategy, high-quality word vectors can be learned from as few as 5 to 10 sentences.

PDF Abstract