Additive Attention, also known as Bahdanau Attention, uses a one-hidden layer feed-forward network to calculate the attention alignment score:
$$f_{att}\left(\textbf{h}_{i}, \textbf{s}_{j}\right) = v_{a}^{T}\tanh\left(\textbf{W}_{a}\left[\textbf{h}_{i};\textbf{s}_{j}\right]\right)$$
where $\textbf{v}_{a}$ and $\textbf{W}_{a}$ are learned attention parameters. Here $\textbf{h}$ refers to the hidden states for the encoder, and $\textbf{s}$ is the hidden states for the decoder. The function above is thus a type of alignment score function. We can use a matrix of alignment scores to show the correlation between source and target words, as the Figure to the right shows.
Within a neural network, once we have the alignment scores, we calculate the final scores using a softmax function of these alignment scores (ensuring it sums to 1).
Source: Neural Machine Translation by Jointly Learning to Align and Translate| Paper | Code | Results | Date | Stars |
|---|
| Task | Papers | Share |
|---|---|---|
| Speech Synthesis | 44 | 10.43% |
| Sentence | 28 | 6.64% |
| Decoder | 20 | 4.74% |
| Reinforcement Learning (RL) | 15 | 3.55% |
| Text-To-Speech Synthesis | 15 | 3.55% |
| Combinatorial Optimization | 14 | 3.32% |
| Reinforcement Learning | 12 | 2.84% |
| Language Modelling | 10 | 2.37% |
| Speech Recognition | 8 | 1.90% |