Mogrifier LSTM Explained | Papers With Code

Method Name:*

Method Full Name:*

Description with Markdown (optional):

The **Mogrifier LSTM** is an extension to the [LSTM](https://paperswithcode.com/method/lstm) where the LSTM’s input $\mathbf{x}$ is gated conditioned on the output of the previous step $\mathbf{h}\_{prev}$. Next, the gated input is used in a similar manner to gate the output of the
previous time step. After a couple of rounds of this mutual gating, the last updated $\mathbf{x}$ and $\mathbf{h}\_{prev}$ are fed to an LSTM.

In detail, the Mogrifier is an LSTM where two inputs $\mathbf{x}$ and $\mathbf{h}\_{prev}$ modulate one another in an alternating fashion before the usual LSTM computation takes place. That is: $ \text{Mogrify}\left(\mathbf{x}, \mathbf{c}\_{prev}, \mathbf{h}\_{prev}\right) = \text{LSTM}\left(\mathbf{x}^{↑}, \mathbf{c}\_{prev}, \mathbf{h}^{↑}\_{prev}\right)$ where the modulated inputs $\mathbf{x}^{↑}$ and $\mathbf{h}^{↑}\_{prev}$ are defined as the highest indexed $\mathbf{x}^{i}$ and $\mathbf{h}^{i}\_{prev}$, respectively, from the interleaved sequences:

$$ \mathbf{x}^{i} = 2\sigma\left(\mathbf{Q}^{i}\mathbf{h}^{i−1}\_{prev}\right) \odot x^{i-2} \text{ for odd } i \in \left[1 \dots r\right] $$

$$ \mathbf{h}^{i}\_{prev}  = 2\sigma\left(\mathbf{R}^{i}\mathbf{x}^{i-1}\right) \odot \mathbf{h}^{i-2}\_{prev} \text{ for even } i \in \left[1 \dots r\right] $$

with $\mathbf{x}^{-1} = \mathbf{x}$ and $\mathbf{h}^{0}\_{prev} = \mathbf{h}\_{prev}$. The number of "rounds", $r \in \mathbb{N}$, is a hyperparameter; $r = 0$ recovers the LSTM. Multiplication with the constant 2 ensures that randomly initialized $\mathbf{Q}^{i}$, $\mathbf{R}^{i}$ matrices result in transformations close to identity. To reduce the number of additional model parameters, we typically factorize the $\mathbf{Q}^{i}$, $\mathbf{R}^{i}$ matrices as products of low-rank matrices: $\mathbf{Q}^{i}$ =
$\mathbf{Q}^{i}\_{left}\mathbf{Q}^{i}\_{right}$ with $\mathbf{Q}^{i} \in \mathbb{R}^{m\times{n}}$, $\mathbf{Q}^{i}\_{left} \in \mathbb{R}^{m\times{k}}$, $\mathbf{Q}^{i}\_{right} \in \mathbb{R}^{k\times{n}}$, where $k < \min\left(m, n\right)$ is the rank.

Code Snippet URL (optional):

Image

Currently: methods/Screen_Shot_2020-07-11_at_2.54.01_PM_MWyDllp.png Clear
Change:

Attached collections:

RECURRENT NEURAL NETWORKS

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Language Modelling	2	50.00%
Music Modeling	1	25.00%
Sentiment Analysis	1	25.00%

Mogrifier LSTM

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove