TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Document Summarization	CNN / Daily Mail	LightConv	ROUGE-1	39.52	# 22
Document Summarization	CNN / Daily Mail	LightConv	ROUGE-2	15.97	# 22
Document Summarization	CNN / Daily Mail	LightConv	ROUGE-L	36.51	# 21
Abstractive Text Summarization	CNN / Daily Mail	Dynamic Conv	ROUGE-1	39.84	# 45
Abstractive Text Summarization	CNN / Daily Mail	Dynamic Conv	ROUGE-2	16.25	# 49
Abstractive Text Summarization	CNN / Daily Mail	Dynamic Conv	ROUGE-L	36.73	# 43
Document Summarization	CNN / Daily Mail	DynamicConv	ROUGE-1	39.84	# 21
Document Summarization	CNN / Daily Mail	DynamicConv	ROUGE-2	16.25	# 20
Document Summarization	CNN / Daily Mail	DynamicConv	ROUGE-L	36.73	# 19
Machine Translation	IWSLT2014 German-English	LightConv	BLEU score	34.8	# 25
Machine Translation	IWSLT2014 German-English	DynamicConv	BLEU score	35.2	# 23
Language Modelling	One Billion Word	DynamicConv	PPL	26.67	# 13
Language Modelling	One Billion Word	DynamicConv	Number of params	0.34B	# 1
Machine Translation	WMT2014 English-French	DynamicConv	BLEU score	43.2	# 12
Machine Translation	WMT2014 English-French	LightConv	BLEU score	43.1	# 15
Machine Translation	WMT2014 English-German	LightConv	BLEU score	28.9	# 36
Machine Translation	WMT2014 English-German	LightConv	Number of Params	202M	# 9
Machine Translation	WMT2014 English-German	DynamicConv	BLEU score	29.7	# 18
Machine Translation	WMT2014 English-German	DynamicConv	Number of Params	213M	# 6
Machine Translation	WMT 2017 English-Chinese	LightConv	BLEU score	24.3	# 2
Machine Translation	WMT 2017 English-Chinese	DynamicConv	BLEU score	24.4	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-less-attention-with-lightweight-and/machine-translation-on-wmt-2017-english-1)](https://paperswithcode.com/sota/machine-translation-on-wmt-2017-english-1?p=pay-less-attention-with-lightweight-and)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-less-attention-with-lightweight-and/machine-translation-on-wmt2014-english-french)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-french?p=pay-less-attention-with-lightweight-and)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-less-attention-with-lightweight-and/language-modelling-on-one-billion-word)](https://paperswithcode.com/sota/language-modelling-on-one-billion-word?p=pay-less-attention-with-lightweight-and)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-less-attention-with-lightweight-and/machine-translation-on-wmt2014-english-german)](https://paperswithcode.com/sota/machine-translation-on-wmt2014-english-german?p=pay-less-attention-with-lightweight-and)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-less-attention-with-lightweight-and/document-summarization-on-cnn-daily-mail)](https://paperswithcode.com/sota/document-summarization-on-cnn-daily-mail?p=pay-less-attention-with-lightweight-and)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-less-attention-with-lightweight-and/machine-translation-on-iwslt2014-german)](https://paperswithcode.com/sota/machine-translation-on-iwslt2014-german?p=pay-less-attention-with-lightweight-and)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pay-less-attention-with-lightweight-and/abstractive-text-summarization-on-cnn-daily)](https://paperswithcode.com/sota/abstractive-text-summarization-on-cnn-daily?p=pay-less-attention-with-lightweight-and)`

Pay Less Attention with Lightweight and Dynamic Convolutions

ICLR 2019 · Felix Wu, Angela Fan, Alexei Baevski, Yann N. Dauphin, Michael Auli ·

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step. In this paper, we show that a very lightweight convolution can perform competitively to the best reported self-attention results. Next, we introduce dynamic convolutions which are simpler and more efficient than self-attention. We predict separate convolution kernels based solely on the current time-step in order to determine the importance of context elements. The number of operations required by this approach scales linearly in the input length, whereas self-attention is quadratic. Experiments on large-scale machine translation, language modeling and abstractive summarization show that dynamic convolutions improve over strong self-attention models. On the WMT'14 English-German test set dynamic convolutions achieve a new state of the art of 29.7 BLEU.

PDF Abstract ICLR 2019 PDF ICLR 2019 Abstract

Code

Add Remove Mark official

pytorch/fairseq official

29,251

bytedance/neurst

293

dqqcasia/st

dyunis/light_dynamic_conv

Tasks

Add Remove

Abstractive Text Summarization

Language Modelling

Machine Translation

Translation

Datasets

CNN/Daily Mail

WMT 2014 Billion Word Benchmark

Results from the Paper

Edit

Ranked #1 on Machine Translation on WMT 2017 English-Chinese

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Document Summarization	CNN / Daily Mail	LightConv	ROUGE-1	39.52	# 22	Compare
			ROUGE-2	15.97	# 22	Compare
			ROUGE-L	36.51	# 21	Compare
Abstractive Text Summarization	CNN / Daily Mail	Dynamic Conv	ROUGE-1	39.84	# 45	Compare
			ROUGE-2	16.25	# 49	Compare
			ROUGE-L	36.73	# 43	Compare
Document Summarization	CNN / Daily Mail	DynamicConv	ROUGE-1	39.84	# 21	Compare
			ROUGE-2	16.25	# 20	Compare
			ROUGE-L	36.73	# 19	Compare
Machine Translation	IWSLT2014 German-English	LightConv	BLEU score	34.8	# 25	Compare
Machine Translation	IWSLT2014 German-English	DynamicConv	BLEU score	35.2	# 23	Compare
Language Modelling	One Billion Word	DynamicConv	PPL	26.67	# 13	Compare
Language Modelling	One Billion Word	DynamicConv	Number of params	0.34B	# 1	Compare
Machine Translation	WMT2014 English-French	DynamicConv	BLEU score	43.2	# 12	Compare
Machine Translation	WMT2014 English-French	LightConv	BLEU score	43.1	# 15	Compare
Machine Translation	WMT2014 English-German	LightConv	BLEU score	28.9	# 36	Compare
Machine Translation	WMT2014 English-German	LightConv	Number of Params	202M	# 9	Compare
Machine Translation	WMT2014 English-German	DynamicConv	BLEU score	29.7	# 18	Compare
Machine Translation	WMT2014 English-German	DynamicConv	Number of Params	213M	# 6	Compare
Machine Translation	WMT 2017 English-Chinese	LightConv	BLEU score	24.3	# 2	Compare
Machine Translation	WMT 2017 English-Chinese	DynamicConv	BLEU score	24.4	# 1	Compare

Methods

Add Remove

Convolution • Depthwise Convolution • DropConnect • DynamicConv • GLU • LightConv • Linear Layer • Softmax

Edit Social Preview

Pay Less Attention with Lightweight and Dynamic Convolutions

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove