Random Temporal Skipping for Multirate Video Analysis
Current state-of-the-art approaches to video understanding adopt temporal jittering to simulate analyzing the video at varying frame rates. However, this does not work well for multirate videos, in which actions or subactions occur at different speeds. The frame sampling rate should vary in accordance with the different motion speeds. In this work, we propose a simple yet effective strategy, termed random temporal skipping, to address this situation. This strategy effectively handles multirate videos by randomizing the sampling rate during training. It is an exhaustive approach, which can potentially cover all motion speed variations. Furthermore, due to the large temporal skipping, our network can see video clips that originally cover over 100 frames. Such a time range is enough to analyze most actions/events. We also introduce an occlusion-aware optical flow learning method that generates improved motion maps for human action recognition. Our framework is end-to-end trainable, runs in real-time, and achieves state-of-the-art performance on six widely adopted video benchmarks.
PDF Abstract