Action Recognition
881 papers with code • 49 benchmarks • 105 datasets
Action Recognition is a computer vision task that involves recognizing human actions in videos or images. The goal is to classify and categorize the actions being performed in the video or image into a predefined set of action classes.
In the video domain, it is an open question whether training an action classification network on a sufficiently large dataset, will give a similar boost in performance when applied to a different temporal task or dataset. The challenges of building video datasets has meant that most popular benchmarks for action recognition are small, having on the order of 10k videos.
Please note some benchmarks may be located in the Action Classification or Video Classification tasks, e.g. Kinetics-400.
Libraries
Use these libraries to find Action Recognition models and implementationsDatasets
Subtasks
- Action Recognition In Videos
- 3D Action Recognition
- Self-Supervised Action Recognition
- Few Shot Action Recognition
- Few Shot Action Recognition
- Fine-grained Action Recognition
- Action Triplet Recognition
- Open Set Action Recognition
- Micro-Action Recognition
- Weakly-Supervised Action Recognition
- Atomic action recognition
- Animal Action Recognition
- Transportation Mode Detection
- Open Vocabulary Action Recognition
- Action Recognition In Still Images
Latest papers with no code
Driver Activity Classification Using Generalizable Representations from Vision-Language Models
In this paper, we present a novel approach leveraging generalizable representations from vision-language models for driver activity classification.
Combating Missing Modalities in Egocentric Videos at Test Time
Understanding videos that contain multiple modalities is crucial, especially in egocentric videos, where combining various sensory inputs significantly improves tasks like action recognition and moment localization.
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
The denoised text classes help OVAR models classify visual samples more accurately; in return, classified visual samples help better denoising.
Attack on Scene Flow using Point Clouds
Robustness of these techniques, however, remains a concern, particularly in the face of adversarial attacks that have been proven to deceive state-of-the-art deep neural networks in many domains.
Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition
Existing methods usually adopt a two-stage pipeline, where object proposals are first detected using a pretrained detector, and then are fed to an action recognition model for extracting video features and learning the object relations for action recognition.
Lower Limb Movements Recognition Based on Feature Recursive Elimination and Backpropagation Neural Network
In this paper, a method for lower limb movements recognition based on recursive feature elimination and backpropagation neural network of support vector machine is proposed.
MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition
To address this issue, we propose an innovative Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation (MK-SGN).
Learning to Score Sign Language with Two-stage Method
Human action recognition and performance assessment have been hot research topics in recent years.
HumMUSS: Human Motion Understanding using State Space Models
Understanding human motion from video is essential for a range of applications, including pose estimation, mesh recovery and action recognition.
Leveraging Temporal Contextualization for Video Action Recognition
We propose Temporal Contextualization (TC), a novel layer-wise temporal information infusion mechanism for video that extracts core information from each frame, interconnects relevant information across the video to summarize into context tokens, and ultimately leverages the context tokens during the feature encoding process.