Attention Mechanisms

# Location Sensitive Attention

Introduced by Chorowski et al. in Attention-Based Models for Speech Recognition

Location Sensitive Attention is an attention mechanism that extends the additive attention mechanism to use cumulative attention weights from previous decoder time steps as an additional feature. This encourages the model to move forward consistently through the input, mitigating potential failure modes where some subsequences are repeated or ignored by the decoder.

Starting with additive attention where $h$ is a sequential representation from a BiRNN encoder and ${s}_{i-1}$ is the $(i − 1)$-th state of a recurrent neural network (e.g. a LSTM or GRU):

$$e_{i, j} = w^{T}\tanh\left(W{s}_{i-1} + Vh_{j} + b\right)$$

where $w$ and $b$ are vectors, $W$ and $V$ are matrices. We extend this to be location-aware by making it take into account the alignment produced at the previous step. First, we extract $k$ vectors $f_{i,j} \in \mathbb{R}^{k}$ for every position $j$ of the previous alignment $\alpha_{i−1}$ by convolving it with a matrix $F \in R^{k\times{r}}$:

$$f_{i} = F ∗ \alpha_{i−1}$$

These additional vectors $f_{i,j}$ are then used by the scoring mechanism $e_{i,j}$:

$$e_{i,j} = w^{T}\tanh\left(Ws_{i−1} + Vh_{j} + Uf_{i,j} + b\right)$$

#### Papers

Paper Code Results Date Stars