Recurrent Neural Networks

Mogrifier LSTM

Introduced by Melis et al. in Mogrifier LSTM

The Mogrifier LSTM is an extension to the LSTM where the LSTM’s input $\mathbf{x}$ is gated conditioned on the output of the previous step $\mathbf{h}_{prev}$. Next, the gated input is used in a similar manner to gate the output of the previous time step. After a couple of rounds of this mutual gating, the last updated $\mathbf{x}$ and $\mathbf{h}_{prev}$ are fed to an LSTM.

In detail, the Mogrifier is an LSTM where two inputs $\mathbf{x}$ and $\mathbf{h}_{prev}$ modulate one another in an alternating fashion before the usual LSTM computation takes place. That is: $ \text{Mogrify}\left(\mathbf{x}, \mathbf{c}_{prev}, \mathbf{h}_{prev}\right) = \text{LSTM}\left(\mathbf{x}^{↑}, \mathbf{c}_{prev}, \mathbf{h}^{↑}_{prev}\right)$ where the modulated inputs $\mathbf{x}^{↑}$ and $\mathbf{h}^{↑}_{prev}$ are defined as the highest indexed $\mathbf{x}^{i}$ and $\mathbf{h}^{i}_{prev}$, respectively, from the interleaved sequences:

$$ \mathbf{x}^{i} = 2\sigma\left(\mathbf{Q}^{i}\mathbf{h}^{i−1}_{prev}\right) \odot x^{i-2} \text{ for odd } i \in \left[1 \dots r\right] $$

$$ \mathbf{h}^{i}_{prev} = 2\sigma\left(\mathbf{R}^{i}\mathbf{x}^{i-1}\right) \odot \mathbf{h}^{i-2}_{prev} \text{ for even } i \in \left[1 \dots r\right] $$

with $\mathbf{x}^{-1} = \mathbf{x}$ and $\mathbf{h}^{0}_{prev} = \mathbf{h}_{prev}$. The number of "rounds", $r \in \mathbb{N}$, is a hyperparameter; $r = 0$ recovers the LSTM. Multiplication with the constant 2 ensures that randomly initialized $\mathbf{Q}^{i}$, $\mathbf{R}^{i}$ matrices result in transformations close to identity. To reduce the number of additional model parameters, we typically factorize the $\mathbf{Q}^{i}$, $\mathbf{R}^{i}$ matrices as products of low-rank matrices: $\mathbf{Q}^{i}$ = $\mathbf{Q}^{i}_{left}\mathbf{Q}^{i}_{right}$ with $\mathbf{Q}^{i} \in \mathbb{R}^{m\times{n}}$, $\mathbf{Q}^{i}_{left} \in \mathbb{R}^{m\times{k}}$, $\mathbf{Q}^{i}_{right} \in \mathbb{R}^{k\times{n}}$, where $k < \min\left(m, n\right)$ is the rank.

Source: Mogrifier LSTM


Paper Code Results Date Stars


Task Papers Share
Language Modelling 1 100.00%


Component Type
Recurrent Neural Networks