Action Recognition

881 papers with code • 49 benchmarks • 105 datasets

Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.

In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.

Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Recognition

Dataset	Best Model	Compare
Something-Something V2	InternVideo2-6B	See all
UCF101	VideoMAE V2-g	See all
HMDB-51	VideoMAE V2-g	See all
Something-Something V1	InternVideo	See all
AVA v2.2	LART (Hiera-H, K700 PT+FT)	See all
EPIC-KITCHENS-100	Avion (ViT-L)	See all
NTU RGB+D	PoseC3D (RGB + Pose)	See all
NTU RGB+D 120	PoseC3D (RGB + Pose)	See all
Diving-48	Video-FocalNet-B	See all
ActivityNet	Text4Vis (w/ ViT-L)	See all
AVA v2.1	STAR/L	See all
THUMOS’14	BMN	See all
Sports-1M	ip-CSN-152 (RGB)	See all
HACS	InternVideo2-6B	See all
Charades-Ego	LaViLa (Finetuned, TimeSformer-L)	See all
HAA500	TSN	See all
BAR	DebiAN	See all
UAV-Human	PMI Sampler	See all
Volleyball	PoseC3D (Pose Only)	See all
Real Life Violence Situations Dataset	DeVTr	See all
RareAct	🦩 Flamingo	See all
Jester (Gesture Recognition)	DirecFormer	See all
miniSports	IF+MD+RGB-R (ResNet-18)	See all
IRD	OHA-GCN (Two stream; HP + OHP-hands + informative samples)	See all
ICVL-4	OHA-GCN (Two stream; HP + OHP-hands + informative samples)	See all
UCF-101	DMC-Net (ResNet-18)	See all
Mimetics	JMRN	See all
Drone-Action	FAR	See all
Okutama-Action	PLAR with bbox (Ours)	See all
Animal Kingdom	MSQNet	See all
Charades	MSQNet	See all
VIRAT Ground 2.0	DHCM	See all
ActionNet-VE	Baseline	See all
UTD-MHAD	Action Machine (RGB only)	See all
EgoGesture	TSM+W3	See all
EPIC-KITCHENS-55	TSM+W3 - full res	See all
HMDB51	MSQNet	See all
MECCANO	SlowFast	See all
Win-Fail Action Understanding	2DCNN+TRN	See all
MTL-AQA	C3D-AVG	See all
UCF 101	R2+1D-BERT	See all
Penn Action	STAR-Transformer (RGB + Pose)	See all
Skeleton-Mimetics	Structured Keypoint Pooling	See all
RoCoG-v2	AZTR (Ours)	See all
NEC Drone	FAR	See all
UAV Human	FAR	See all
THUMOS14	MSQNet	See all
Hockey	MSQNet	See all
N-UCLA	DVANet	See all

Show all 49 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Recognition models and implementations

open-mmlab/mmaction2

20 papers

3,888

towhee-io/towhee

10 papers

2,991

yjxiong/caffe

4 papers

550

rwightman/pytorch-image-models

3 papers

29,758

See all 8 libraries.

Datasets

Subtasks

Few Shot Action Recognition

Fine-grained Action Recognition

Action Triplet Recognition

Open Set Action Recognition

Micro-Action Recognition

Weakly-Supervised Action Recognition

Atomic action recognition

Animal Action Recognition

Transportation Mode Detection

Open Vocabulary Action Recognition

Action Recognition In Still Images

Latest papers

Most implemented Social Latest No code

CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment

zhoukanglei/cofinal_aqa • • 22 Apr 2024

However, this common strategy yields suboptimal results due to the inherent struggle of these backbones to capture the subtle cues essential for AQA.

22 Apr 2024

Paper
Code

Aligning Actions and Walking to LLM-Generated Textual Descriptions

radu1999/walkandtext • 18 Apr 2024

For action recognition, we employ LLMs to generate textual descriptions of actions in the BABEL-60 dataset, facilitating the alignment of motion sequences with linguistic representations.

18 Apr 2024

Paper
Code

VG4D: Vision-Language Model Goes 4D Video Recognition

shark0-0/vg4d • 17 Apr 2024

By transferring the knowledge of the VLM to the 4D encoder and combining the VLM, our VG4D achieves improved recognition performance.

17 Apr 2024

Paper
Code

ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos

faceonlive/ai-research • 9 Apr 2024

Our framework leverages both labeled and unlabelled data to robustly learn action representations in videos, combining pseudo-labeling with contrastive learning for effective learning from both types of samples.

152

09 Apr 2024

Paper
Code

TIM: A Time Interval Machine for Audio-Visual Action Recognition

faceonlive/ai-research • 8 Apr 2024

We address the interplay between the two modalities in long videos by explicitly modelling the temporal extents of audio and visual events.

152

08 Apr 2024

Paper
Code

PREGO: online mistake detection in PRocedural EGOcentric videos

aleflabo/prego • • 2 Apr 2024

We propose PREGO, the first online one-class classification model for mistake detection in PRocedural EGOcentric videos.

02 Apr 2024

Paper
Code

Disentangled Pre-training for Human-Object Interaction Detection

xingaoli/dp-hoi • 2 Apr 2024

Therefore, we propose an efficient disentangled pre-training method for HOI detection (DP-HOI) to address this problem.

02 Apr 2024

Paper
Code

OmniVid: A Generative Framework for Universal Video Understanding

wangjk666/omnivid • • 26 Mar 2024

The core of video understanding tasks, such as recognition, captioning, and tracking, is to automatically detect objects or actions in a video and analyze their temporal evolution.

26 Mar 2024

Paper
Code

Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects

zc-alexfan/arctic • • 25 Mar 2024

We interact with the world with our hands and see it through our own (egocentric) perspective.

224

25 Mar 2024

Paper
Code

Understanding Long Videos in One Multimodal Language Model Pass

kahnchana/mvu • • 25 Mar 2024

In addition to faster inference, we discover the resulting models to yield surprisingly good accuracy on long-video tasks, even with no video specific information.

25 Mar 2024

Paper
Code

Action Recognition

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result