Browse > Computer Vision > Action Recognition

Action Recognition

71 papers with code · Computer Vision

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions

CVPR 2018 tensorflow/models

The AVA dataset densely annotates 80 atomic visual actions in 430 15-minute video clips, where actions are localized in space and time, resulting in 1.58M action labels with multiple labels per person occurring frequently. The key characteristics of our dataset are: (1) the definition of atomic visual actions, rather than composite actions; (2) precise spatio-temporal annotations with possibly multiple annotations for each person; (3) exhaustive annotation of these atomic actions over 15-minute video clips; (4) people temporally linked across consecutive segments; and (5) using movies to gather a varied set of action representations.

ACTION LOCALIZATION ACTION RECOGNITION VIDEO UNDERSTANDING

Can Spatiotemporal 3D CNNs Retrace the History of 2D CNNs and ImageNet?

CVPR 2018 kenshohara/3D-ResNets-PyTorch

The purpose of this study is to determine whether current video datasets have sufficient data for training very deep convolutional neural networks (CNNs) with spatio-temporal three-dimensional (3D) kernels. (iii) Kinetics pretrained simple 3D architectures outperforms complex 2D architectures, and the pretrained ResNeXt-101 achieved 94.5% and 70.2% on UCF-101 and HMDB-51, respectively.

ACTION RECOGNITION

Temporal Segment Networks for Action Recognition in Videos

8 May 2017yjxiong/temporal-segment-networks

We present a general and flexible video-level framework for learning action models in videos. Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

ACTION RECOGNITION IN VIDEOS VIDEO CLASSIFICATION

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

2 Aug 2016yjxiong/temporal-segment-networks

It combines a sparse temporal sampling strategy and video-level supervision to enable efficient and effective learning using the whole action video. The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

ACTION RECOGNITION IN VIDEOS

Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition

23 Jan 2018yysijie/st-gcn

Dynamics of human body skeletons convey significant information for human action recognition. Conventional approaches for modeling skeletons usually rely on hand-crafted parts or traversal rules, thus resulting in limited expressive power and difficulties of generalization.

SKELETON BASED ACTION RECOGNITION

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

CVPR 2017 deepmind/kinetics-i3d

The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks. This paper re-evaluates state-of-the-art architectures in light of the new Kinetics Human Action Video dataset.

ACTION CLASSIFICATION ACTION RECOGNITION

Sparse 3D convolutional neural networks

12 May 2015facebookresearch/SparseConvNet

We have implemented a convolutional neural network designed for processing sparse three-dimensional input data. The world we live in is three dimensional so there are a large number of potential applications including 3D object recognition and analysis of space-time objects.

3D OBJECT RECOGNITION ACTION RECOGNITION

Real-time Action Recognition with Enhanced Motion Vector CNNs

CVPR 2016 yjxiong/caffe

The deep two-stream architecture exhibited excellent performance on video based action recognition. The most computationally expensive step in this approach comes from the calculation of optical flow which prevents it to be real-time.

ACTION RECOGNITION OPTICAL FLOW ESTIMATION

Towards Good Practices for Very Deep Two-Stream ConvNets

8 Jul 2015yjxiong/caffe

However, for action recognition in videos, the improvement of deep convolutional networks is not so evident. Second, probably more importantly, the training dataset of action recognition is extremely small compared with the ImageNet dataset, and thus it will be easy to over-fit on the training dataset.

ACTION RECOGNITION IN VIDEOS

A Closer Look at Spatiotemporal Convolutions for Action Recognition

CVPR 2018 facebookresearch/R2Plus1D

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition.

ACTION RECOGNITION