TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Machine Translation	WMT2014 English-French	Transformer Big + MoS	BLEU score	42.1	# 18
Machine Translation	WMT2014 English-French	Transformer Big + MoS	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-French	Transformer Big + MoS	Operations per network pass	None	# 1
Machine Translation	WMT2014 English-German	Transformer Big + MoS	BLEU score	29.6	# 20
Machine Translation	WMT2014 English-German	Transformer Big + MoS	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-German	Transformer Big + MoS	Operations per network pass	None	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fast-and-simple-mixture-of-softmaxes-with-bpe/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=fast-and-simple-mixture-of-softmaxes-with-bpe)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/fast-and-simple-mixture-of-softmaxes-with-bpe/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=fast-and-simple-mixture-of-softmaxes-with-bpe)`

Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

25 Sep 2018 · Xiang Kong, Qizhe Xie, Zihang Dai, Eduard Hovy ·

Mixture of Softmaxes (MoS) has been shown to be effective at addressing the expressiveness limitation of Softmax-based models. Despite the known advantage, MoS is practically sealed by its large consumption of memory and computational time due to the need of computing multiple Softmaxes. In this work, we set out to unleash the power of MoS in practical applications by investigating improved word coding schemes, which could effectively reduce the vocabulary size and hence relieve the memory and computation burden. We show both BPE and our proposed Hybrid-LightRNN lead to improved encoding mechanisms that can halve the time and memory consumption of MoS without performance losses. With MoS, we achieve an improvement of 1.5 BLEU scores on IWSLT 2014 German-to-English corpus and an improvement of 0.76 CIDEr score on image captioning. Moreover, on the larger WMT 2014 machine translation dataset, our MoS-boosted Transformer yields 29.5 BLEU score for English-to-German and 42.1 BLEU score for English-to-French, outperforming the single-Softmax Transformer by 0.8 and 0.4 BLEU scores respectively and achieving the state-of-the-art result on WMT 2014 English-to-German task.

PDF Abstract

Code

Add Remove Mark official

shawnkx/Fast-MoS

Tasks

Add Remove

Image Captioning

Machine Translation

Text Generation

Translation

Datasets

MS COCO

WMT 2014

Results from the Paper

Edit

Ranked #18 on Machine Translation on WMT2014 English-French

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Machine Translation	WMT2014 English-French	Transformer Big + MoS	BLEU score	42.1	# 18	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare
Machine Translation	WMT2014 English-German	Transformer Big + MoS	BLEU score	29.6	# 20	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Mixture of Softmaxes • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Fast and Simple Mixture of Softmaxes with BPE and Hybrid-LightRNN for Language Generation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove