TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Machine Translation	WMT2014 English-French	Transformer (big) + Relative Position Representations	BLEU score	41.5	# 22
Machine Translation	WMT2014 English-German	Transformer (big) + Relative Position Representations	BLEU score	29.2	# 29

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-attention-with-relative-position/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=self-attention-with-relative-position)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/self-attention-with-relative-position/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=self-attention-with-relative-position)`

Self-Attention with Relative Position Representations

NAACL 2018 · Peter Shaw, Jakob Uszkoreit, Ashish Vaswani ·

Relying entirely on an attention mechanism, the Transformer introduced by Vaswani et al. (2017) achieves state-of-the-art results for machine translation. In contrast to recurrent and convolutional neural networks, it does not explicitly model relative or absolute position information in its structure. Instead, it requires adding representations of absolute positions to its inputs. In this work we present an alternative approach, extending the self-attention mechanism to efficiently consider representations of the relative positions, or distances between sequence elements. On the WMT 2014 English-to-German and English-to-French translation tasks, this approach yields improvements of 1.3 BLEU and 0.3 BLEU over absolute position representations, respectively. Notably, we observe that combining relative and absolute position representations yields no further improvement in translation quality. We describe an efficient implementation of our method and cast it as an instance of relation-aware self-attention mechanisms that can generalize to arbitrary graph-labeled inputs.

PDF Abstract NAACL 2018 PDF NAACL 2018 Abstract

Code

Add Remove Mark official

tensorflow/tensor2tensor official

↳ Quickstart in

Colab

14,887

opennmt/ctranslate2

2,796

OpenNMT/OpenNMT-tf

1,441

The-AI-Summer/self-attention-cv

1,141

THUNLP-MT/THUMT

691

See all 12 implementations

Tasks

Add Remove

Machine Translation

Position

Translation

Datasets

WMT 2014

Results from the Paper

Edit

Ranked #22 on Machine Translation on WMT2014 English-French

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Machine Translation	WMT2014 English-French	Transformer (big) + Relative Position Representations	BLEU score	41.5	# 22		Compare
Machine Translation	WMT2014 English-German	Transformer (big) + Relative Position Representations	BLEU score	29.2	# 29		Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Relative Position Encodings • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Self-Attention with Relative Position Representations

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove