Pay Less Attention with Lightweight and Dynamic Convolutions

ICLR 2019 Felix WuAngela FanAlexei BaevskiYann N. DauphinMichael Auli

Self-attention is a useful mechanism to build generative models for language and images. It determines the importance of context elements by comparing each element to the current time step... (read more)

PDF Abstract

Evaluation results from the paper

Task Dataset Model Metric name Metric value Global rank Compare
Language Modelling One Billion Word DynamicConv PPL 26.67 # 6
Language Modelling One Billion Word DynamicConv Number of params 0.34B # 1
Machine Translation WMT2014 English-French DynamicConv BLEU score 43.2 # 4
Machine Translation WMT2014 English-French LightConv BLEU score 43.1 # 5
Machine Translation WMT2014 English-German DynamicConv BLEU score 29.7 # 2
Machine Translation WMT2014 English-German LightConv BLEU score 28.9 # 8