Action Classification

227 papers with code • 24 benchmarks • 30 datasets

Image source: The Kinetics Human Action Video Dataset

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Classification

Dataset	Best Model	Compare
Kinetics-400	InternVideo2-6B	See all
Kinetics-600	InternVideo2-6B	See all
Charades	TokenLearner	See all
Kinetics-700	InternVideo2-6B	See all
MiT	InternVideo2-6B	See all
Toyota Smarthome dataset	π-ViT	See all
AViD	TokenLearner	See all
THUMOS’14	3C-Net	See all
ActivityNet-1.2	W-TALC	See all
Kinetics-Sounds	Mirasol3B	See all
TTStroke-21 ME22	RGB and PRGB	See all
HMDB51	DualPath w/ ViT-B/16 MLPs.	See all
MiniKinetics	MARS+RGB+Flow (16 frames)	See all
YouCook2	VideoBERT (cross modal)	See all
UCF101	Ours	See all
Something-Something V2	CAST-B/16	See all
THUMOS'14	3C-Net	See all
Jester test	C2F	See all
BABEL	2s-AGCN	See all
ActivityNet	UniFormerV2-L	See all
TTStroke-21 ME21	STCNN	See all
Diving-48	DualPath w/ ViT-B/16	See all
CelebV-HQ	MARLIN	See all
Moments in Time	OmniVec	See all

Show all 24 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Classification models and implementations

open-mmlab/mmaction2

15 papers

3,892

towhee-io/towhee

8 papers

2,991

rwightman/pytorch-image-models

4 papers

29,774

facebookresearch/pytorchvideo

3 papers

3,182

See all 18 libraries.

Datasets

Latest papers

Most implemented Social Latest No code

Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification

VUT-HFUT/MiGA2023_Track1 • • 20 Jul 2023

In this paper, we briefly introduce the solution of our team HFUT-VUT for the Micros-gesture Classification in the MiGA challenge at IJCAI 2023.

20 Jul 2023

Paper
Code

What Can Simple Arithmetic Operations Do for Temporal Modeling?

whwu95/ATM • • ICCV 2023

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

18 Jul 2023

Paper
Code

Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers

dominickrei/poseawarevt • • 15 Jun 2023

Both PAAT and PAAB surpass their respective backbone Transformers by up to 9. 8% in real-world action recognition and 21. 8% in multi-view robotic video alignment.

15 Jun 2023

Paper
Code

HomE: Homography-Equivariant Video Representation Learning

anirudhs123/home • 2 Jun 2023

In this work, we propose a novel method for representation learning of multi-view videos, where we explicitly model the representation space to maintain Homography Equivariance (HomE).

02 Jun 2023

Paper
Code

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

facebookresearch/hiera • • 1 Jun 2023

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

692

01 Jun 2023

Paper
Code

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

modelscope/modelscope • • 18 May 2023

In this work, we explore a scalable way for building a general representation model toward unlimited modalities.

6,055

18 May 2023

Paper
Code

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

francis-rings/ila • • ICCV 2023

While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.

20 Apr 2023

Paper
Code

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

OpenGVLab/VideoMAEv2 • • CVPR 2023

Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90. 0% on K400 and 89. 9% on K600) and Something-Something (68. 7% on V1 and 77. 0% on V2).

396

29 Mar 2023

Paper
Code

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

opengvlab/unmasked_teacher • • ICCV 2023

Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain.

242

28 Mar 2023

Paper
Code

The effectiveness of MAE pre-pretraining for billion-scale pretraining

facebookresearch/maws • • ICCV 2023

While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well.

23 Mar 2023

Paper
Code

Action Classification

Benchmarks Add a Result

Libraries

Datasets

Latest papers

Content

Benchmarks

Add a Result