Action Classification

227 papers with code • 24 benchmarks • 30 datasets

Libraries

Use these libraries to find Action Classification models and implementations

Most implemented papers

High Quality Monocular Depth Estimation via Transfer Learning

ialhashim/DenseDepth 31 Dec 2018

Accurate depth estimation from images is a fundamental task in many applications including scene understanding and reconstruction.

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

open-mmlab/mmaction2 CVPR 2017

The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has made it difficult to identify good video architectures, as most methods obtain similar performance on existing small-scale benchmarks.

Non-local Neural Networks

facebookresearch/video-nonlocal-net CVPR 2018

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time.

Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution

facebookresearch/OctConv ICCV 2019

Similarly, the output feature maps of a convolution layer can also be seen as a mixture of information at different frequencies.

A Closer Look at Spatiotemporal Convolutions for Action Recognition

facebookresearch/R2Plus1D CVPR 2018

In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition.

Temporal Segment Networks: Towards Good Practices for Deep Action Recognition

yjxiong/temporal-segment-networks 2 Aug 2016

The other contribution is our study on a series of good practices in learning ConvNets on video data with the help of temporal segment network.

Swin Transformer V2: Scaling Up Capacity and Resolution

microsoft/Swin-Transformer CVPR 2022

Three main techniques are proposed: 1) a residual-post-norm method combined with cosine attention to improve training stability; 2) A log-spaced continuous position bias method to effectively transfer models pre-trained using low-resolution images to downstream tasks with high-resolution inputs; 3) A self-supervised pre-training method, SimMIM, to reduce the needs of vast labeled images.

Video Swin Transformer

SwinTransformer/Video-Swin-Transformer CVPR 2022

The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks.

TSM: Temporal Shift Module for Efficient Video Understanding

MIT-HAN-LAB/temporal-shift-module ICCV 2019

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.