Universal Transformers

ICLR 2019 • Mostafa Dehghani • Stephan Gouws • Oriol Vinyals • Jakob Uszkoreit • Łukasz Kaiser

Recurrent neural networks (RNNs) sequentially process data by updating their state with each new data point, and have long been the de facto choice for sequence modeling tasks. However, their inherently sequential computation makes them slow to train... (read more)

PDF Abstract

Evaluation results from the paper

Task Dataset Model Metric name Metric value Global rank Compare
Machine Translation WMT 2014 EN-DE universal transformer base BLEU 28.9 # 1
Machine Translation WMT2014 English-German universal transformer base BLEU score 28.9 # 6