TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Language Modelling	One Billion Word	Evolved Transformer Big	PPL	28.6	# 15
Machine Translation	WMT2014 English-Czech	Evolved Transformer Base	BLEU score	27.6	# 2
Machine Translation	WMT2014 English-Czech	Evolved Transformer Big	BLEU score	28.2	# 1
Machine Translation	WMT2014 English-French	Evolved Transformer Big	BLEU score	41.3	# 24
Machine Translation	WMT2014 English-French	Evolved Transformer Base	BLEU score	40.6	# 28
Machine Translation	WMT2014 English-German	Evolved Transformer Base	BLEU score	28.4	# 44
Machine Translation	WMT2014 English-German	Evolved Transformer Base	Hardware Burden	2488G	# 1
Machine Translation	WMT2014 English-German	Evolved Transformer Big	BLEU score	29.3	# 25
Machine Translation	WMT2014 English-German	Evolved Transformer Big	Number of Params	222M	# 3
Machine Translation	WMT2014 English-German	Evolved Transformer Big	BLEU score	29.8	# 16
Machine Translation	WMT2014 English-German	Evolved Transformer Big	SacreBLEU	29.2	# 6
Machine Translation	WMT2014 English-German	Evolved Transformer Big	Number of Params	218M	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-evolved-transformer/machine-translation-on-wmt2014-english-czech)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-czech?p=the-evolved-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-evolved-transformer/language-modelling-on-one-billion-word)](https://paperswithcode.com/sota/language-modelling-on-one-billion-word?p=the-evolved-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-evolved-transformer/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=the-evolved-transformer)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-evolved-transformer/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=the-evolved-transformer)`

The Evolved Transformer

30 Jan 2019 · David R. So, Chen Liang, Quoc V. Le ·

Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolutionary architecture search with warm starting by seeding our initial population with the Transformer. To directly search on the computationally expensive WMT 2014 English-German translation task, we develop the Progressive Dynamic Hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments -- the Evolved Transformer -- demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At a big model size, the Evolved Transformer establishes a new state-of-the-art BLEU score of 29.8 on WMT'14 English-German; at smaller sizes, it achieves the same quality as the original "big" Transformer with 37.6% less parameters and outperforms the Transformer by 0.7 BLEU at a mobile-friendly model size of 7M parameters.

PDF Abstract

Code

Add Remove Mark official

tensorflow/tensor2tensor official

↳ Quickstart in

Colab

14,878

nazarov-yuriy/zh-ru-shared-task

moon23k/Transformer_Archs

Tasks

Add Remove

Machine Translation

Neural Architecture Search

Datasets

CIFAR-10

WMT 2014 Billion Word Benchmark

Results from the Paper

Edit

Ranked #1 on Machine Translation on WMT2014 English-Czech

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Language Modelling	One Billion Word	Evolved Transformer Big	PPL	28.6	# 15	Compare
Machine Translation	WMT2014 English-Czech	Evolved Transformer Base	BLEU score	27.6	# 2	Compare
Machine Translation	WMT2014 English-Czech	Evolved Transformer Big	BLEU score	28.2	# 1	Compare
Machine Translation	WMT2014 English-French	Evolved Transformer Big	BLEU score	41.3	# 24	Compare
Machine Translation	WMT2014 English-French	Evolved Transformer Base	BLEU score	40.6	# 28	Compare
Machine Translation	WMT2014 English-German	Evolved Transformer Base	BLEU score	28.4	# 44	Compare
Machine Translation	WMT2014 English-German	Evolved Transformer Base	Hardware Burden	2488G	# 1	Compare
Machine Translation	WMT2014 English-German	Evolved Transformer Big	BLEU score	29.3	# 25	Compare
			Number of Params	222M	# 3	Compare
			BLEU score	29.8	# 16	Compare
			SacreBLEU	29.2	# 6	Compare
			Number of Params	218M	# 4	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • LSTM • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Sigmoid Activation • Softmax • Tanh Activation • Transformer

Edit Social Preview

The Evolved Transformer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove