|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels.
#12 best model for Action Recognition In Videos on UCF101
Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow.
SOTA for Action Recognition In Videos on ActivityNet (using extra training data)
Dynamics of human body skeletons convey significant information for human action recognition.
#2 best model for Skeleton Based Action Recognition on Varying-view RGB-D Action-Skeleton
The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks.
SOTA for Action Recognition In Videos on UCF101 (using extra training data)
Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.
#5 best model for Action Classification on Moments in Time (Top 5 Accuracy metric)
The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.
#3 best model for Multimodal Activity Recognition on EV-Action
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.
It is natural to ask: 1) if group convolution can help to alleviate the high computational cost of video classification networks; 2) what factors matter the most in 3D group convolutional networks; and 3) what are good computation/accuracy trade-offs with 3D group convolutional networks.
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.
#3 best model for Action Recognition In Videos on Sports-1M