Action Classification

227 papers with code • 24 benchmarks • 30 datasets

Image source: The Kinetics Human Action Video Dataset

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Classification

Dataset	Best Model	Compare
Kinetics-400	InternVideo2-6B	See all
Kinetics-600	InternVideo2-6B	See all
Charades	TokenLearner	See all
Kinetics-700	InternVideo2-6B	See all
MiT	InternVideo2-6B	See all
Toyota Smarthome dataset	π-ViT	See all
AViD	TokenLearner	See all
THUMOS’14	3C-Net	See all
ActivityNet-1.2	W-TALC	See all
Kinetics-Sounds	Mirasol3B	See all
TTStroke-21 ME22	RGB and PRGB	See all
HMDB51	DualPath w/ ViT-B/16 MLPs.	See all
MiniKinetics	MARS+RGB+Flow (16 frames)	See all
YouCook2	VideoBERT (cross modal)	See all
UCF101	Ours	See all
Something-Something V2	CAST-B/16	See all
THUMOS'14	3C-Net	See all
Jester test	C2F	See all
BABEL	2s-AGCN	See all
ActivityNet	UniFormerV2-L	See all
TTStroke-21 ME21	STCNN	See all
Diving-48	DualPath w/ ViT-B/16	See all
CelebV-HQ	MARLIN	See all
Moments in Time	OmniVec	See all

Show all 24 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Classification models and implementations

open-mmlab/mmaction2

15 papers

3,876

towhee-io/towhee

8 papers

2,972

rwightman/pytorch-image-models

4 papers

29,671

facebookresearch/pytorchvideo

3 papers

3,178

See all 18 libraries.

Datasets

Latest papers with no code

Most implemented Social Latest No code

After-Stroke Arm Paresis Detection using Kinematic Data

no code yet • 3 Nov 2023

This paper presents an approach for detecting unilateral arm paralysis/weakness using kinematic data.

Paper
Add Code

Proposal-based Temporal Action Localization with Point-level Supervision

no code yet • 9 Oct 2023

Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos where only a single point (frame) within every action instance is annotated in training data.

Paper
Add Code

SkeleTR: Towrads Skeleton-based Action Recognition in the Wild

no code yet • 20 Sep 2023

It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.

Paper
Add Code

Semi Supervised Meta Learning for Spatiotemporal Learning

no code yet • 9 Jul 2023

Broadly, we seek to understand the impact of applying meta-learning to existing state-of-the-art representation learning architectures.

Paper
Add Code

Spiking Two-Stream Methods with Unsupervised STDP-based Learning for Action Recognition

no code yet • 23 Jun 2023

Implementing this model with unsupervised STDP-based CSNNs allows us to further study the performance of these networks with video analysis.

Paper
Add Code

How Object Information Improves Skeleton-based Human Action Recognition in Assembly Tasks

no code yet • 9 Jun 2023

Our research sheds light on the benefits of combining skeleton joints with object information for human action recognition in assembly tasks.

Paper
Add Code

Human Action Recognition in Egocentric Perspective Using 2D Object and Hands Pose

no code yet • 8 Jun 2023

Egocentric action recognition is essential for healthcare and assistive technology that relies on egocentric cameras because it allows for the automatic and continuous monitoring of activities of daily living (ADLs) without requiring any conscious effort from the user.

Paper
Add Code

Self-Supervised Video Representation Learning via Latent Time Navigation

no code yet • 10 May 2023

Self-supervised video representation learning aimed at maximizing similarity between different temporal segments of one video, in order to enforce feature persistence over time.

Paper
Add Code

AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation

no code yet • CVPR 2023

To obtain high-quality 3D hand pose annotations for the egocentric images, we develop an efficient pipeline, where we use an initial set of manual annotations to train a model to automatically annotate a much larger dataset.

Paper
Add Code

VicTR: Video-conditioned Text Representations for Activity Recognition

no code yet • 5 Apr 2023

In this paper, we argue the contrary, that better video-VLMs can be designed by focusing more on augmenting text, rather than visual information.

Paper
Add Code

Action Classification

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result