|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow.
SOTA for Action Recognition In Videos on ActivityNet (using extra training data)
The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels.
#12 best model for Action Recognition In Videos on UCF101
Dynamics of human body skeletons convey significant information for human action recognition.
#2 best model for Skeleton Based Action Recognition on Varying-view RGB-D Action-Skeleton
Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.
#6 best model for Action Classification on Moments in Time
The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.
#3 best model for Multimodal Activity Recognition on EV-Action
The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks.
#3 best model for Action Classification on HMDB51 (using extra training data)
It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.
#2 best model for Action Recognition In Videos on Sports-1M
Temporal relational reasoning, the ability to link meaningful transformations of objects or entities over time, is a fundamental property of intelligent species.
#2 best model for Action Recognition In Videos on Jester