Action Classification

165 papers with code • 15 benchmarks • 21 datasets

Libraries

Use these libraries to find Action Classification models and implementations

Most implemented papers

High Quality Monocular Depth Estimation via Transfer Learning

ialhashim/DenseDepth 31 Dec 2018

Accurate depth estimation from images is a fundamental task in many applications including scene understanding and reconstruction.

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

facebookresearch/OctConv ICCV 2019

Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies.

Non-local Neural Networks

facebookresearch/video-nonlocal-net CVPR 2018

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

open-mmlab/mmaction2 CVPR 2017

The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks.

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

yjxiong/temporal-segment-networks 2 Aug 2016

The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

A Closer Look at Spatiotemporal Convolutions for Action Recognition

facebookresearch/R2Plus1D CVPR 2018

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.

The Kinetics Human Action Video Dataset

deepmind/kinetics-i3d 19 May 2017

We describe the DeepMind Kinetics human action video dataset.

Is Space-Time Attention All You Need for Video Understanding?

facebookresearch/TimeSformer 9 Feb 2021

We present a convolution-free approach to video classification built exclusively on self-attention over space and time.

Swin Transformer V2: Scaling Up Capacity and Resolution

microsoft/Swin-Transformer CVPR 2022

Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.