TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Machine Translation	IWSLT2014 German-English	Local Joint Self-attention	BLEU score	35.7	# 19
Machine Translation	WMT2014 English-French	Local Joint Self-attention	BLEU score	43.3	# 10
Machine Translation	WMT2014 English-French	Local Joint Self-attention	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-French	Local Joint Self-attention	Operations per network pass	None	# 1
Machine Translation	WMT2014 English-German	Local Joint Self-attention	BLEU score	29.7	# 18
Machine Translation	WMT2014 English-German	Local Joint Self-attention	Hardware Burden	None	# 1
Machine Translation	WMT2014 English-German	Local Joint Self-attention	Operations per network pass	None	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/190506596/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=190506596)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/190506596/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=190506596)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/190506596/machine-translation-on-iwslt2014-german)](https://paperswithcode.com/sota/machine-translation-on-iwslt2014-german?p=190506596)`

Joint Source-Target Self Attention with Locality Constraints

16 May 2019 · José A. R. Fonollosa, Noe Casas, Marta R. Costa-jussà ·

The dominant neural machine translation models are based on the encoder-decoder structure, and many of them rely on an unconstrained receptive field over source and target sequences. In this paper we study a new architecture that breaks with both conventions. Our simplified architecture consists in the decoder part of a transformer model, based on self-attention, but with locality constraints applied on the attention receptive field. As input for training, both source and target sentences are fed to the network, which is trained as a language model. At inference time, the target tokens are predicted autoregressively starting with the source sequence as previous tokens. The proposed model achieves a new state of the art of 35.7 BLEU on IWSLT'14 German-English and matches the best reported results in the literature on the WMT'14 English-German and WMT'14 English-French translation benchmarks.

PDF Abstract

Code

Add Remove Mark official

jarfo/joint official

lkfo415579/joint

Tasks

Add Remove

Language Modelling

Machine Translation

Translation

Datasets

WMT 2014

Results from the Paper

Edit

Ranked #10 on Machine Translation on WMT2014 English-French

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Machine Translation	IWSLT2014 German-English	Local Joint Self-attention	BLEU score	35.7	# 19	Compare
Machine Translation	WMT2014 English-French	Local Joint Self-attention	BLEU score	43.3	# 10	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare
Machine Translation	WMT2014 English-German	Local Joint Self-attention	BLEU score	29.7	# 18	Compare
			Hardware Burden	None	# 1	Compare
			Operations per network pass	None	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • ReLU • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Joint Source-Target Self Attention with Locality Constraints

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove