A crucial task of Video Understanding is to recognise and localise (in space and time) different actions or events appearing in the video.
In this paper we propose a method that leverages temporal context from the unlabeled frames of a novel camera to improve performance at that camera.
The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1. 58M action labels with multiple labels per person occurring frequently.
Ranked #2 on Temporal Action Localization on J-HMDB-21
Learning to represent videos is a very challenging task both algorithmically and computationally.
We empirically demonstrate a general and robust grid schedule that yields a significant out-of-the-box training speedup without a loss in accuracy for different models (I3D, non-local, SlowFast), datasets (Kinetics, Something-Something, Charades), and training settings (with and without pre-training, 128 GPUs or 1 GPU).
Ranked #1 on Video Classification on Kinetics
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.
Ranked #4 on Action Recognition on Something-Something V2 (using extra training data)
This paper addresses the problem of estimating and tracking human body keypoints in complex, multi-person video.
Ranked #5 on Pose Tracking on PoseTrack2017 (using extra training data)
To understand the world, we humans constantly need to relate the present to the past, and put events in context.
Ranked #3 on Egocentric Activity Recognition on EPIC-KITCHENS-55
We demonstrate that using both RNNs (using LSTMs) and Temporal-ConvNets on spatiotemporal feature matrices are able to exploit spatiotemporal dynamics to improve the overall performance.
Ranked #33 on Action Recognition on UCF101
In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.