Universal Transformers

ICLR 2019 Mostafa DehghaniStephan GouwsOriol VinyalsJakob UszkoreitŁukasz Kaiser

Recurrent neural networks (RNNs) sequentially process data by updating their state with each new data point, and have long been the de facto choice for sequence modeling tasks. However, their inherently sequential computation makes them slow to train... (read more)

PDF Abstract

Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Machine Translation WMT2014 English-German universal transformer base BLEU score 28.9 # 8