Action Classification

227 papers with code • 24 benchmarks • 30 datasets

Libraries

Use these libraries to find Action Classification models and implementations

Dual-path Adaptation from Image to Video Transformers

park-jungin/dualpath CVPR 2023

In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.

39
17 Mar 2023

Scaling Vision Transformers to 22 Billion Parameters

lucidrains/flash-cosine-sim-attention 10 Feb 2023

The scaling of Transformers has driven breakthrough capabilities for language models.

192
10 Feb 2023

AIM: Adapting Image Models for Efficient Video Action Recognition

taoyang1122/adapt-image-models 6 Feb 2023

Recent vision transformer based video models mostly follow the ``image pre-training then finetuning" paradigm and have achieved great success on multiple video benchmarks.

241
06 Feb 2023

Baseline Method for the Sport Task of MediaEval 2022 with 3D CNNs using Attention Mechanisms

ccp-eva/sporttaskme22 6 Feb 2023

We propose two types of 3D-CNN architectures to solve the two subtasks.

4
06 Feb 2023

Fine-Grained Action Detection with RGB and Pose Information using Two Stream Convolutional Networks

fidsinn/sporttaskme22 6 Feb 2023

As participants of the MediaEval 2022 Sport Task, we propose a two-stream network approach for the classification and detection of table tennis strokes.

1
06 Feb 2023

mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video

modelscope/modelscope 1 Feb 2023

In contrast to predominant paradigms of solely relying on sequence-to-sequence generation or encoder-based instance discrimination, mPLUG-2 introduces a multi-module composition network by sharing common universal modules for modality collaboration and disentangling different modality modules to deal with modality entanglement.

6,101
01 Feb 2023

Hierarchical Explanations for Video Action Recognition

sadafgulshad1/Hierarchical-ProtoPNet 1 Jan 2023

To interpret deep neural networks, one main approach is to dissect the visual input and find the prototypical parts responsible for the classification.

1
01 Jan 2023

Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models

whwu95/Cap4Video CVPR 2023

In this paper, we propose a novel framework called BIKE, which utilizes the cross-modal bridge to explore bidirectional knowledge: i) We introduce the Video Attribute Association mechanism, which leverages the Video-to-Text knowledge to generate textual auxiliary attributes for complementing video recognition.

203
31 Dec 2022

Learning Video Representations from Large Language Models

facebookresearch/lavila CVPR 2023

We introduce LaViLa, a new approach to learning video-language representations by leveraging Large Language Models (LLMs).

440
08 Dec 2022

Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning

ruiwang2021/mvd CVPR 2023

For the choice of teacher models, we observe that students taught by video teachers perform better on temporally-heavy video tasks, while image teachers transfer stronger spatial representations for spatially-heavy video tasks.

89
08 Dec 2022