Recurrent Neural Networks

Quasi-Recurrent Neural Network

Introduced by Bradbury et al. in Quasi-Recurrent Neural Networks

A QRNN, or Quasi-Recurrent Neural Network, is a type of recurrent neural network that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels. Due to their increased parallelism, they can be up to 16 times faster at train and test time than LSTMs.

Given an input sequence $\mathbf{X} \in \mathbb{R}^{T\times{n}}$ of $T$ n-dimensional vectors $\mathbf{x}_{1}, \dots, \mathbf{x}_{T}$, the convolutional subcomponent of a QRNN performs convolutions in the timestep dimension with a bank of $m$ filters, producing a sequence $\mathbf{Z} \in \mathbb{R}^{T\times{m}}$ of m-dimensional candidate vectors $\mathbf{z}_{t}$. Masked convolutions are used so filters can not access information from future timesteps (implementing with left padding).

Additional convolutions are applied with separate filter banks to obtain sequences of vectors for the elementwise gates that are needed for the pooling function. While the candidate vectors are passed through a $\tanh$ nonlinearity, the gates use an elementwise sigmoid. If the pooling function requires a forget gate $f_{t}$ and an output gate $o_{t}$ at each timestep, the full set of computations in the convolutional component is then:

$$ \mathbf{Z} = \tanh\left(\mathbf{W}_{z} ∗ \mathbf{X}\right) $$ $$ \mathbf{F} = \sigma\left(\mathbf{W}_{f} ∗ \mathbf{X}\right) $$ $$ \mathbf{O} = \sigma\left(\mathbf{W}_{o} ∗ \mathbf{X}\right) $$

where $\mathbf{W}_{z}$, $\mathbf{W}_{f}$, and $\mathbf{W}_{o}$, each in $\mathbb{R}^{k×n×m}$, are the convolutional filter banks and ∗ denotes a masked convolution along the timestep dimension. Dynamic average pooling by Balduzzi & Ghifary (2016) is used, which uses only a forget gate:

$$ \mathbf{h}_{t} = \mathbf{f}_{t} \odot{\mathbf{h}_{t−1}} + \left(1 − \mathbf{f}_{t}\right) \odot{?\mathbf{z}_{t}} $$

Which is denoted f-pooling. The function may also include an output gate:

$$ \mathbf{c}_{t} = \mathbf{f}_{t} \odot{\mathbf{c}_{t−1}} + \left(1 − \mathbf{f}_{t}\right) \odot{?\mathbf{z}_{t}} $$

$$ \mathbf{h}_{t} = \mathbf{o}_{t} \odot{\mathbf{c}_{t}} $$

Which is denoted fo-pooling. Or the recurrence relation may include an independent input and forget gate:

$$ \mathbf{c}_{t} = \mathbf{f}_{t} \odot{\mathbf{c}_{t−1}} + \mathbf{i}_{t}\odot{?\mathbf{z}_{t}} $$

$$ \mathbf{h}_{t} = \mathbf{o}_{t} \odot{\mathbf{c}_{t}} $$

Which is denoted ifo-pooling. In each case $h$ or $c$ is initialized to zero. The recurrent part sof these functions must be calculated for each timestep in the sequence, but parallelism along feature dimensions means evaluating them even over long sequences requires a negligible amount of computation time.

A single QRNN layer thus performs an input-dependent pooling, followed by a gated linear combination of convolutional features. As with convolutional neural networks, two or more QRNN layers should be stacked to create a model with the capacity to approximate more complex functions.

Source: Quasi-Recurrent Neural Networks

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Time Series Prediction	3	13.04%
Decision Making	2	8.70%
Time Series Analysis	2	8.70%
Language Modelling	2	8.70%
Sentiment Analysis	2	8.70%
Ensemble Learning	1	4.35%
Feature Engineering	1	4.35%
Benchmarking	1	4.35%
Feature Importance	1	4.35%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Convolution	Convolutions
Masked Convolution	Convolutions
Sigmoid Activation	Activation Functions
Tanh Activation	Activation Functions

Categories

Add Remove

Recurrent Neural Networks