Additive Attention

Introduced by Bahdanau et al. in Neural Machine Translation by Jointly Learning to Align and Translate

Additive Attention, also known as Bahdanau Attention, uses a one-hidden layer feed-forward network to calculate the attention alignment score:

$$f_{att}\left(\textbf{h}_{i}, \textbf{s}_{j}\right) = v_{a}^{T}\tanh\left(\textbf{W}_{a}\left[\textbf{h}_{i};\textbf{s}_{j}\right]\right)$$

where $\textbf{v}_{a}$ and $\textbf{W}_{a}$ are learned attention parameters. Here $\textbf{h}$ refers to the hidden states for the encoder, and $\textbf{s}$ is the hidden states for the decoder. The function above is thus a type of alignment score function. We can use a matrix of alignment scores to show the correlation between source and target words, as the Figure to the right shows.

Within a neural network, once we have the alignment scores, we calculate the final scores using a softmax function of these alignment scores (ensuring it sums to 1).

Source: Neural Machine Translation by Jointly Learning to Align and Translate

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Speech Synthesis	43	11.81%
Sentence	28	7.69%
Reinforcement Learning (RL)	15	4.12%
Text-To-Speech Synthesis	15	4.12%
Combinatorial Optimization	14	3.85%
Language Modelling	10	2.75%
Speech Recognition	8	2.20%
Starcraft II	7	1.92%
Translation	7	1.92%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Tanh Activation	Activation Functions

Categories

Add Remove

Attention Mechanisms