Effective Dimensionality Reduction for Word Embeddings

WS 2019  ·  Vikas Raunak, Vivek Gupta, Florian Metze ·

Pre-trained word embeddings are used in several downstream applications as well as for constructing representations for sentences, paragraphs and documents. Recently, there has been an emphasis on improving the pretrained word vectors through post-processing algorithms. One improvement area is reducing the dimensionality of word embeddings. Reducing the size of word embeddings can improve their utility in memory constrained devices, benefiting several real world applications. In this work, we present a novel technique that efficiently combines PCA based dimensionality reduction with a recently proposed post-processing algorithm (Mu and Viswanath, 2018), to construct effective word embeddings of lower dimensions. Empirical evaluations on several benchmarks show that our algorithm efficiently reduces the embedding size while achieving similar or (more often) better performance than original embeddings. We have released the source code along with this paper.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods