Attention Is All You Need

NeurIPS 2017 Ashish VaswaniNoam ShazeerNiki ParmarJakob UszkoreitLlion JonesAidan N. GomezLukasz KaiserIllia Polosukhin

The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism... (read more)

PDF Abstract

Code


Evaluation results from the paper


Task Dataset Model Metric name Metric value Global rank Compare
Machine Translation IWSLT2015 English-German Transformer BLEU score 28.23 # 1
Machine Translation IWSLT2015 German-English Transformer BLEU score 34.44 # 1
Constituency Parsing Penn Treebank Transformer F1 score 92.7 # 9
Machine Translation WMT2014 English-French Transformer Base BLEU score 38.1 # 14
Machine Translation WMT2014 English-French Transformer Big BLEU score 41.0 # 8
Machine Translation WMT2014 English-German Transformer Base BLEU score 27.3 # 7
Machine Translation WMT2014 English-German Transformer Big BLEU score 28.4 # 6