Understanding Attention Training via Output Relevance

16 Aug 2020 · Charlie Snell, Ruiqi Zhong, Jacob Steinhardt, Dan Klein ·

In recurrent models with attention, the learned attention weights sometimes correlate with individual token importance, even though the training objective does not explicitly reward this. To understand why, we study the training dynamics of attention for sequence classification and translation. We identify a quantity in the model, which we call the \emph{output relevance}, and show that it drives the learning of the attention. If we ablate attention by fixing it to uniform, the output relevance still correlates with the attention of a normally trained model; but if we instead ablate output relevance, attention cannot be learned. Using output relevance, we explain why attention correlates with gradient-based interpretation; and perhaps surprisingly, a Seq2Seq with attention model sometimes fails to learn a simple permutation copying task. Finally, we discuss evidence that multi-head attention improves not only expressiveness but also learning dynamics.

PDF Abstract