FRAGE: Frequency-Agnostic Word Representation

NeurIPS 2018 Chengyue GongDi HeXu TanTao QinLiwei WangTie-Yan Liu

Continuous word representation (aka word embedding) is a basic building block in many neural network-based models used in natural language processing tasks. Although it is widely accepted that words with similar semantics should be close to each other in the embedding space, we find that word embeddings learned in several tasks are biased towards word frequency: the embeddings of high-frequency and low-frequency words lie in different subregions of the embedding space, and the embedding of a rare word and a popular word can be far from each other even if they are semantically similar... (read more)

PDF Abstract

Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Machine Translation IWSLT2015 German-English Transformer with FRAGE BLEU score 33.97 # 3
Language Modelling Penn Treebank (Word Level) FRAGE + AWD-LSTM-MoS + dynamic eval Validation perplexity 47.38 # 1
Language Modelling Penn Treebank (Word Level) FRAGE + AWD-LSTM-MoS + dynamic eval Test perplexity 46.54 # 2
Language Modelling Penn Treebank (Word Level) FRAGE + AWD-LSTM-MoS + dynamic eval Params 22M # 1
Language Modelling WikiText-2 FRAGE + AWD-LSTM-MoS + dynamic eval Validation perplexity 40.85 # 1
Language Modelling WikiText-2 FRAGE + AWD-LSTM-MoS + dynamic eval Test perplexity 39.14 # 2
Language Modelling WikiText-2 FRAGE + AWD-LSTM-MoS + dynamic eval Number of params 35M # 1
Machine Translation WMT2014 English-German Transformer Big with FRAGE BLEU score 29.11 # 5