Moving MNIST

Introduced by Srivastava et al. in Unsupervised Learning of Video Representations using LSTMs

The Moving MNIST dataset contains 10,000 video sequences, each consisting of 20 frames. In each video sequence, two digits move independently around the frame, which has a spatial resolution of 64×64 pixels. The digits frequently intersect with each other and bounce off the edges of the frame

Source: Mutual Suppression Network for Video Prediction using Disentangled Features

Homepage