Self-Attention with Relative Position Representations

HLT 2018 Peter ShawJakob UszkoreitAshish Vaswani

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure... (read more)

PDF Abstract

Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Machine Translation WMT2014 English-French Transformer (big) + Relative Position Representations BLEU score 41.5 # 4
Machine Translation WMT2014 English-German Transformer (big) + Relative Position Representations BLEU score 29.2 # 4