ConvLSTM

Introduced by Shi et al. in Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

ConvLSTM is a type of recurrent neural network for spatio-temporal prediction that has convolutional structures in both the input-to-state and state-to-state transitions. The ConvLSTM determines the future state of a certain cell in the grid by the inputs and past states of its local neighbors. This can easily be achieved by using a convolution operator in the state-to-state and input-to-state transitions (see Figure). The key equations of ConvLSTM are shown below, where $∗$ denotes the convolution operator and $\odot$ the Hadamard product:

$$ i_{t} = \sigma\left(W_{xi} ∗ X_{t} + W_{hi} ∗ H_{t−1} + W_{ci} \odot \mathcal{C}_{t−1} + b_{i}\right) $$

$$ f_{t} = \sigma\left(W_{xf} ∗ X_{t} + W_{hf} ∗ H_{t−1} + W_{cf} \odot \mathcal{C}_{t−1} + b_{f}\right) $$

$$ \mathcal{C}_{t} = f_{t} \odot \mathcal{C}_{t−1} + i_{t} \odot \text{tanh}\left(W_{xc} ∗ X_{t} + W_{hc} ∗ \mathcal{H}_{t−1} + b_{c}\right) $$

$$ o_{t} = \sigma\left(W_{xo} ∗ X_{t} + W_{ho} ∗ \mathcal{H}_{t−1} + W_{co} \odot \mathcal{C}_{t} + b_{o}\right) $$

$$ \mathcal{H}_{t} = o_{t} \odot \text{tanh}\left(C_{t}\right) $$

If we view the states as the hidden representations of moving objects, a ConvLSTM with a larger transitional kernel should be able to capture faster motions while one with a smaller kernel can capture slower motions.

To ensure that the states have the same number of rows and same number of columns as the inputs, padding is needed before applying the convolution operation. Here, padding of the hidden states on the boundary points can be viewed as using the state of the outside world for calculation. Usually, before the first input comes, we initialize all the states of the LSTM to zero which corresponds to "total ignorance" of the future.

Source: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Video Prediction	13	5.99%
Semantic Segmentation	11	5.07%
Time Series Analysis	9	4.15%
Image Segmentation	8	3.69%
Optical Flow Estimation	7	3.23%
Activity Recognition	5	2.30%
Object Detection	5	2.30%
Super-Resolution	5	2.30%
Weather Forecasting	4	1.84%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Convolution	Convolutions
Sigmoid Activation	Activation Functions
Tanh Activation	Activation Functions

Categories

Add Remove

Recurrent Neural Networks