Action Classification

227 papers with code • 24 benchmarks • 30 datasets

Libraries

Use these libraries to find Action Classification models and implementations

Joint Skeletal and Semantic Embedding Loss for Micro-gesture Classification

VUT-HFUT/MiGA2023_Track1 20 Jul 2023

In this paper, we briefly introduce the solution of our team HFUT-VUT for the Micros-gesture Classification in the MiGA challenge at IJCAI 2023.

7
20 Jul 2023

What Can Simple Arithmetic Operations Do for Temporal Modeling?

whwu95/ATM ICCV 2023

We conduct comprehensive ablation studies on the instantiation of ATMs and demonstrate that this module provides powerful temporal modeling capability at a low computational cost.

65
18 Jul 2023

Seeing the Pose in the Pixels: Learning Pose-Aware Representations in Vision Transformers

dominickrei/poseawarevt 15 Jun 2023

Both PAAT and PAAB surpass their respective backbone Transformers by up to 9. 8% in real-world action recognition and 21. 8% in multi-view robotic video alignment.

19
15 Jun 2023

HomE: Homography-Equivariant Video Representation Learning

anirudhs123/home 2 Jun 2023

In this work, we propose a novel method for representation learning of multi-view videos, where we explicitly model the representation space to maintain Homography Equivariance (HomE).

5
02 Jun 2023

Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles

facebookresearch/hiera 1 Jun 2023

Modern hierarchical vision transformers have added several vision-specific components in the pursuit of supervised classification performance.

692
01 Jun 2023

ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities

modelscope/modelscope 18 May 2023

In this work, we explore a scalable way for building a general representation model toward unlimited modalities.

6,055
18 May 2023

Implicit Temporal Modeling with Learnable Alignment for Video Recognition

francis-rings/ila ICCV 2023

While modeling temporal information within straight through tube is widely adopted in literature, we find that simple frame alignment already provides enough essence without temporal attention.

28
20 Apr 2023

VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking

OpenGVLab/VideoMAEv2 CVPR 2023

Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90. 0% on K400 and 89. 9% on K600) and Something-Something (68. 7% on V1 and 77. 0% on V2).

396
29 Mar 2023

Unmasked Teacher: Towards Training-Efficient Video Foundation Models

opengvlab/unmasked_teacher ICCV 2023

Previous VFMs rely on Image Foundation Models (IFMs), which face challenges in transferring to the video domain.

242
28 Mar 2023

The effectiveness of MAE pre-pretraining for billion-scale pretraining

facebookresearch/maws ICCV 2023

While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well.

62
23 Mar 2023