Pay Less Attention with Lightweight and Dynamic Convolutions

ICLR 2019 Felix WuAngela FanAlexei BaevskiYann N. DauphinMichael Auli

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Document Summarization CNN / Daily Mail DynamicConv ROUGE-1 39.84 # 8
ROUGE-2 16.25 # 7
ROUGE-L 36.73 # 8
Document Summarization CNN / Daily Mail LightConv ROUGE-1 39.52 # 9
ROUGE-2 15.97 # 9
ROUGE-L 36.51 # 9
Machine Translation IWSLT2014 German-English LightConv BLEU score 34.8 # 4
Machine Translation IWSLT2014 German-English DynamicConv BLEU score 35.2 # 3
Language Modelling One Billion Word DynamicConv PPL 26.67 # 6
Number of params 0.34B # 1
Machine Translation WMT2014 English-French LightConv BLEU score 43.1 # 7
Machine Translation WMT2014 English-German DynamicConv BLEU score 29.7 # 7
Machine Translation WMT 2017 English-Chinese DynamicConv BLEU score 24.4 # 1
Machine Translation WMT 2017 English-Chinese LightConv BLEU score 24.3 # 2