Universal Transformers

10 Jul 2018 • Mostafa Dehghani • Stephan Gouws • Oriol Vinyals • Jakob Uszkoreit • Łukasz Kaiser

Moreover, and in contrast to RNNs, the Transformer model is not computationally universal, limiting its theoretical expressivity. In this paper we propose the Universal Transformer which addresses these practical and theoretical shortcomings and we show that it leads to improved performance on several tasks. We further employ an adaptive computation time (ACT) mechanism to allow the model to dynamically adjust the number of times the representation of each position in a sequence is revised.

Full paper


Task Dataset Model Metric name Metric value Global rank Compare
Machine Translation WMT 2014 EN-DE universal transformer base BLEU 28.9 # 1
Machine Translation WMT2014 English-German universal transformer base BLEU score 28.9 # 10