TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Machine Translation	WMT2014 English-French	RNMT+	BLEU score	41.0	# 26
Machine Translation	WMT2014 English-French	RNMT+	Hardware Burden	132G	# 1
Machine Translation	WMT2014 English-French	RNMT+	Operations per network pass	2.81G	# 1
Machine Translation	WMT2014 English-German	RNMT+	BLEU score	28.5	# 42
Machine Translation	WMT2014 English-German	RNMT+	Hardware Burden	44G	# 1
Machine Translation	WMT2014 English-German	RNMT+	Operations per network pass	2.81G	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-best-of-both-worlds-combining-recent/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=the-best-of-both-worlds-combining-recent)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-best-of-both-worlds-combining-recent/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=the-best-of-both-worlds-combining-recent)`

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

ACL 2018 · Mia Xu Chen, Orhan Firat, Ankur Bapna, Melvin Johnson, Wolfgang Macherey, George Foster, Llion Jones, Niki Parmar, Mike Schuster, Zhifeng Chen, Yonghui Wu, Macduff Hughes ·

The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new approaches consists of a fundamental architecture accompanied by a set of modeling and training techniques that are in principle applicable to other seq2seq architectures. In this paper, we tease apart the new architectures and their accompanying techniques in two ways. First, we identify several key modeling and training techniques, and apply them to the RNN architecture, yielding a new RNMT+ model that outperforms all of the three fundamental architectures on the benchmark WMT'14 English to French and English to German tasks. Second, we analyze the properties of each fundamental seq2seq architecture and devise new hybrid architectures intended to combine their strengths. Our hybrid models obtain further improvements, outperforming the RNMT+ model on both benchmark datasets.

PDF Abstract ACL 2018 PDF ACL 2018 Abstract

Code

Add Remove Mark official

tensorflow/lingvo official

↳ Quickstart in

Colab

2,780

duyvuleo/Transformer-DyNet

zysite/post

Tasks

Add Remove

Machine Translation

Translation

Datasets

WMT 2014

Results from the Paper

Edit

Ranked #26 on Machine Translation on WMT2014 English-French

Get a GitHub badge

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
Machine Translation	WMT2014 English-French	RNMT+	BLEU score	41.0	# 26	See all
			Hardware Burden	132G	# 1	See all
			Operations per network pass	2.81G	# 1	See all
Machine Translation	WMT2014 English-German	RNMT+	BLEU score	28.5	# 42	See all
			Hardware Burden	44G	# 1	See all
			Operations per network pass	2.81G	# 1	See all

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • LSTM • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Seq2Seq • Sigmoid Activation • Softmax • Tanh Activation • Transformer

Edit Social Preview

The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit