Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively.
|Task||Dataset||Model||Metric name||Metric value||Global rank||Compare|
|Machine Translation||WMT2014 English-French||Transformer (big) + Relative Position Representations||BLEU score||41.5||# 5|
|Machine Translation||WMT2014 English-German||Transformer (big) + Relative Position Representations||BLEU score||29.2||# 6|