Recurrent Neural Networks

ConvLSTM is a type of recurrent neural network for spatio-temporal prediction that has convolutional structures in both the input-to-state and state-to-state transitions. The ConvLSTM determines the future state of a certain cell in the grid by the inputs and past states of its local neighbors. This can easily be achieved by using a convolution operator in the state-to-state and input-to-state transitions (see Figure). The key equations of ConvLSTM are shown below, where $∗$ denotes the convolution operator and $\odot$ the Hadamard product:

$$ i_{t} = \sigma\left(W_{xi} ∗ X_{t} + W_{hi} ∗ H_{t−1} + W_{ci} \odot \mathcal{C}_{t−1} + b_{i}\right) $$

$$ f_{t} = \sigma\left(W_{xf} ∗ X_{t} + W_{hf} ∗ H_{t−1} + W_{cf} \odot \mathcal{C}_{t−1} + b_{f}\right) $$

$$ \mathcal{C}_{t} = f_{t} \odot \mathcal{C}_{t−1} + i_{t} \odot \text{tanh}\left(W_{xc} ∗ X_{t} + W_{hc} ∗ \mathcal{H}_{t−1} + b_{c}\right) $$

$$ o_{t} = \sigma\left(W_{xo} ∗ X_{t} + W_{ho} ∗ \mathcal{H}_{t−1} + W_{co} \odot \mathcal{C}_{t} + b_{o}\right) $$

$$ \mathcal{H}_{t} = o_{t} \odot \text{tanh}\left(C_{t}\right) $$

If we view the states as the hidden representations of moving objects, a ConvLSTM with a larger transitional kernel should be able to capture faster motions while one with a smaller kernel can capture slower motions.

To ensure that the states have the same number of rows and same number of columns as the inputs, padding is needed before applying the convolution operation. Here, padding of the hidden states on the boundary points can be viewed as using the state of the outside world for calculation. Usually, before the first input comes, we initialize all the states of the LSTM to zero which corresponds to "total ignorance" of the future.

Source: Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting


Paper Code Results Date Stars