ConvLSTM is a type of recurrent neural network for spatiotemporal prediction that has convolutional structures in both the inputtostate and statetostate transitions. The ConvLSTM determines the future state of a certain cell in the grid by the inputs and past states of its local neighbors. This can easily be achieved by using a convolution operator in the statetostate and inputtostate transitions (see Figure). The key equations of ConvLSTM are shown below, where $∗$ denotes the convolution operator and $\odot$ the Hadamard product:
$$ i_{t} = \sigma\left(W_{xi} ∗ X_{t} + W_{hi} ∗ H_{t−1} + W_{ci} \odot \mathcal{C}_{t−1} + b_{i}\right) $$
$$ f_{t} = \sigma\left(W_{xf} ∗ X_{t} + W_{hf} ∗ H_{t−1} + W_{cf} \odot \mathcal{C}_{t−1} + b_{f}\right) $$
$$ \mathcal{C}_{t} = f_{t} \odot \mathcal{C}_{t−1} + i_{t} \odot \text{tanh}\left(W_{xc} ∗ X_{t} + W_{hc} ∗ \mathcal{H}_{t−1} + b_{c}\right) $$
$$ o_{t} = \sigma\left(W_{xo} ∗ X_{t} + W_{ho} ∗ \mathcal{H}_{t−1} + W_{co} \odot \mathcal{C}_{t} + b_{o}\right) $$
$$ \mathcal{H}_{t} = o_{t} \odot \text{tanh}\left(C_{t}\right) $$
If we view the states as the hidden representations of moving objects, a ConvLSTM with a larger transitional kernel should be able to capture faster motions while one with a smaller kernel can capture slower motions.
To ensure that the states have the same number of rows and same number of columns as the inputs, padding is needed before applying the convolution operation. Here, padding of the hidden states on the boundary points can be viewed as using the state of the outside world for calculation. Usually, before the first input comes, we initialize all the states of the LSTM to zero which corresponds to "total ignorance" of the future.
Source: Convolutional LSTM Network: A Machine Learning Approach for Precipitation NowcastingPaper  Code  Results  Date  Stars 

Task  Papers  Share 

Semantic Segmentation  10  6.10% 
Video Prediction  9  5.49% 
Time Series  8  4.88% 
Object Detection  5  3.05% 
Optical Flow Estimation  5  3.05% 
SuperResolution  5  3.05% 
Video SuperResolution  4  2.44% 
Anomaly Detection  3  1.83% 
Spacetime Video Superresolution  3  1.83% 
Component  Type 


Convolution

Convolutions  
Sigmoid Activation

Activation Functions  
Tanh Activation

Activation Functions 