The Moving MNIST dataset contains 10,000 video sequences, each consisting of 20 frames. In each video sequence, two digits move independently around the frame, which has a spatial resolution of 64×64 pixels. The digits frequently intersect with each other and bounce off the edges of the frame
Source: Mutual Suppression Network for Video Prediction using Disentangled FeaturesPaper | Code | Results | Date | Stars |
---|