Action Recognition In Videos

60 papers with code • 17 benchmarks • 16 datasets

Action Recognition in Videos is a task in computer vision and pattern recognition where the goal is to identify and categorize human actions performed in a video sequence. The task involves analyzing the spatiotemporal dynamics of the actions and mapping them to a predefined set of action classes, such as running, jumping, or swimming.


Use these libraries to find Action Recognition In Videos models and implementations
4 papers
3 papers
2 papers
See all 5 libraries.

Most implemented papers

Learning Spatiotemporal Features with 3D Convolutional Networks

facebookarchive/C3D ICCV 2015

We propose a simple, yet effective approach for spatiotemporal feature learning using deep 3-dimensional convolutional networks (3D ConvNets) trained on a large scale supervised video dataset.

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

yjxiong/temporal-segment-networks 2 Aug 2016

The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

Temporal Segment Networks for Action Recognition in Videos

yjxiong/temporal-segment-networks 8 May 2017

Furthermore, based on the temporal segment networks, we won the video classification track at the ActivityNet challenge 2016 among 24 teams, which demonstrates the effectiveness of TSN and the proposed good practices.

UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild

wushidonguc/two-stream-action-recognition-keras 3 Dec 2012

To the best of our knowledge, UCF101 is currently the most challenging dataset of actions due to its large number of classes, large number of clips and also unconstrained nature of such clips.

Two-Stream Convolutional Networks for Action Recognition in Videos

feichtenhofer/twostreamfusion NeurIPS 2014

Our architecture is trained and evaluated on the standard video actions benchmarks of UCF-101 and HMDB-51, where it is competitive with the state of the art.

YouTube-8M: A Large-Scale Video Classification Benchmark

google/youtube-8m 27 Sep 2016

Despite the size of the dataset, some of our models train to convergence in less than a day on a single machine using TensorFlow.

Towards Good Practices for Very Deep Two-Stream ConvNets

yjxiong/caffe 8 Jul 2015

However, for action recognition in videos, the improvement of deep convolutional networks is not so evident.

Representation Flow for Action Recognition

piergiaj/representation-flow-cvpr19 CVPR 2019

Our representation flow layer is a fully-differentiable layer designed to capture the `flow' of any representation channel within a convolutional neural network for action recognition.

You Only Watch Once: A Unified CNN Architecture for Real-Time Spatiotemporal Action Localization

wei-tim/YOWO 15 Nov 2019

YOWO is a single-stage architecture with two branches to extract temporal and spatial information concurrently and predict bounding boxes and action probabilities directly from video clips in one evaluation.