The Evolved Transformer

30 Jan 2019  ·  David R. So, Chen Liang, Quoc V. Le ·

Recent works have highlighted the strength of the Transformer architecture on sequence tasks while, at the same time, neural architecture search (NAS) has begun to outperform human-designed models. Our goal is to apply NAS to search for a better alternative to the Transformer. We first construct a large search space inspired by the recent advances in feed-forward sequence models and then run evolutionary architecture search with warm starting by seeding our initial population with the Transformer. To directly search on the computationally expensive WMT 2014 English-German translation task, we develop the Progressive Dynamic Hurdles method, which allows us to dynamically allocate more resources to more promising candidate models. The architecture found in our experiments -- the Evolved Transformer -- demonstrates consistent improvement over the Transformer on four well-established language tasks: WMT 2014 English-German, WMT 2014 English-French, WMT 2014 English-Czech and LM1B. At a big model size, the Evolved Transformer establishes a new state-of-the-art BLEU score of 29.8 on WMT'14 English-German; at smaller sizes, it achieves the same quality as the original "big" Transformer with 37.6% less parameters and outperforms the Transformer by 0.7 BLEU at a mobile-friendly model size of 7M parameters.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Language Modelling One Billion Word Evolved Transformer Big PPL 28.6 # 15
Machine Translation WMT2014 English-Czech Evolved Transformer Base BLEU score 27.6 # 2
Machine Translation WMT2014 English-Czech Evolved Transformer Big BLEU score 28.2 # 1
Machine Translation WMT2014 English-French Evolved Transformer Big BLEU score 41.3 # 22
Hardware Burden None # 1
Operations per network pass None # 1
Machine Translation WMT2014 English-French Evolved Transformer Base BLEU score 40.6 # 26
Hardware Burden None # 1
Operations per network pass None # 1
Machine Translation WMT2014 English-German Evolved Transformer Base BLEU score 28.4 # 36
Hardware Burden 2488G # 1
Operations per network pass None # 1
Machine Translation WMT2014 English-German Evolved Transformer Big BLEU score 29.3 # 20
BLEU score 29.8 # 13
SacreBLEU 29.2 # 6

Methods