The Evolved Transformer

30 Jan 2019 David R. So Chen Liang Quoc V. Le

Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Language Modelling One Billion Word Evolved Transformer Big PPL 28.6 # 14
Machine Translation WMT2014 English-Czech Evolved Transformer Big BLEU score 28.2 # 1
Machine Translation WMT2014 English-Czech Evolved Transformer Base BLEU score 27.6 # 2
Machine Translation WMT2014 English-French Evolved Transformer Base BLEU score 40.6 # 24
Machine Translation WMT2014 English-French Evolved Transformer Big BLEU score 41.3 # 20
Machine Translation WMT2014 English-German Evolved Transformer Big BLEU score 29.8 # 11
SacreBLEU 29.2 # 6
BLEU score 29.3 # 18
Machine Translation WMT2014 English-German Evolved Transformer Base BLEU score 28.4 # 34

Methods used in the Paper


METHOD TYPE
Sigmoid Activation
Activation Functions
Tanh Activation
Activation Functions
LSTM
Recurrent Neural Networks
Residual Connection
Skip Connections
BPE
Subword Segmentation
Dense Connections
Feedforward Networks
Label Smoothing
Regularization
ReLU
Activation Functions
Adam
Stochastic Optimization
Softmax
Output Functions
Dropout
Regularization
Multi-Head Attention
Attention Modules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
Transformer
Transformers