Action Classification

227 papers with code • 24 benchmarks • 30 datasets

Image source: The Kinetics Human Action Video Dataset

Benchmarks

Add a Result

These leaderboards are used to track progress in Action Classification

Dataset	Best Model	Compare
Kinetics-400	InternVideo2-6B	See all
Kinetics-600	InternVideo2-6B	See all
Charades	TokenLearner	See all
Kinetics-700	InternVideo2-6B	See all
MiT	InternVideo2-6B	See all
Toyota Smarthome dataset	π-ViT	See all
AViD	TokenLearner	See all
THUMOS’14	3C-Net	See all
ActivityNet-1.2	W-TALC	See all
Kinetics-Sounds	Mirasol3B	See all
TTStroke-21 ME22	RGB and PRGB	See all
HMDB51	DualPath w/ ViT-B/16 MLPs.	See all
MiniKinetics	MARS+RGB+Flow (16 frames)	See all
YouCook2	VideoBERT (cross modal)	See all
UCF101	Ours	See all
Something-Something V2	CAST-B/16	See all
THUMOS'14	3C-Net	See all
Jester test	C2F	See all
BABEL	2s-AGCN	See all
ActivityNet	UniFormerV2-L	See all
TTStroke-21 ME21	STCNN	See all
Diving-48	DualPath w/ ViT-B/16	See all
CelebV-HQ	MARLIN	See all
Moments in Time	OmniVec	See all

Show all 24 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Action Classification models and implementations

open-mmlab/mmaction2

15 papers

3,876

towhee-io/towhee

8 papers

2,972

rwightman/pytorch-image-models

4 papers

29,680

facebookresearch/pytorchvideo

3 papers

3,178

See all 18 libraries.

Datasets

Latest papers with no code

Most implemented Social Latest No code

Learning Correlation Structures for Vision Transformers

no code yet • 5 Apr 2024

We introduce a new attention mechanism, dubbed structural self-attention (StructSA), that leverages rich correlation patterns naturally emerging in key-query interactions of attention.

Paper
Add Code

Classification of Tennis Actions Using Deep Learning

no code yet • 4 Feb 2024

Recent advances of deep learning makes it possible to identify specific events in videos with greater precision.

Paper
Add Code

Robustness Evaluation of Machine Learning Models for Robot Arm Action Recognition in Noisy Environments

no code yet • 17 Jan 2024

This paper studies robot arm action recognition in noisy environments using machine learning techniques.

Paper
Add Code

No More Shortcuts: Realizing the Potential of Temporal Self-Supervision

no code yet • 20 Dec 2023

To address these issues, we propose 1) a more challenging reformulation of temporal self-supervision as frame-level (rather than clip-level) recognition tasks and 2) an effective augmentation strategy to mitigate shortcuts.

Paper
Add Code

ST(OR)2: Spatio-Temporal Object Level Reasoning for Activity Recognition in the Operating Room

no code yet • 19 Dec 2023

Surgical robotics holds much promise for improving patient safety and clinician experience in the Operating Room (OR).

Paper
Add Code

AdaFocus: Towards End-to-end Weakly Supervised Learning for Long-Video Action Understanding

no code yet • 28 Nov 2023

Under the weak supervision setting, action labels are provided for the whole video without precise start and end times of the action clip.

Paper
Add Code

ADM-Loc: Actionness Distribution Modeling for Point-supervised Temporal Action Localization

no code yet • 27 Nov 2023

This paper addresses the challenge of point-supervised temporal action detection, in which only one frame per action instance is annotated in the training set.

Paper
Add Code

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

no code yet • 9 Nov 2023

We propose a multimodal model, called Mirasol3B, consisting of an autoregressive component for the time-synchronized modalities (audio and video), and an autoregressive component for the context modalities which are not necessarily aligned in time but are still sequential.

Paper
Add Code

OmniVec: Learning robust representations with cross modal sharing

no code yet • 7 Nov 2023

We demonstrate empirically that, using a joint network to train across modalities leads to meaningful information sharing and this allows us to achieve state-of-the-art results on most of the benchmarks.

Paper
Add Code

Asymmetric Masked Distillation for Pre-Training Small Foundation Models

no code yet • 6 Nov 2023

And AMD achieves 73. 3% classification accuracy using the ViT-B model on the Something-in-Something V2 dataset, a 3. 7% improvement over the original ViT-B model from VideoMAE.

Paper
Add Code

Action Classification

Benchmarks Add a Result

Libraries

Datasets

Latest papers with no code

Content

Benchmarks

Add a Result