Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

25 Sep 2018Xiang KongQizhe XieZihang DaiEduard Hovy

Mixture of Softmaxes (MoS) has been shown to be effective at addressing the expressiveness limitation of Softmax-based models. Despite the known advantage, MoS is practically sealed by its large consumption of memory and computational time due to the need of computing multiple Softmaxes... (read more)

PDF Abstract

Evaluation Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK COMPARE
Machine Translation WMT2014 English-French Transformer Big + MoS BLEU score 42.1 # 8
Machine Translation WMT2014 English-German Transformer Big + MoS BLEU score 29.6 # 7