Action Classification
195 papers with code • 20 benchmarks • 26 datasets
Image source: The Kinetics Human Action Video Dataset
Libraries
Use these libraries to find Action Classification models and implementationsDatasets
Latest papers with no code
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Finally, we successfully train a video ViT model with a billion parameters, which achieves a new state-of-the-art performance on the datasets of Kinetics (90. 0% on K400 and 89. 9% on K600) and Something-Something (68. 7% on V1 and 77. 0% on V2).
Multi-modal Prompting for Low-Shot Temporal Action Localization
In this paper, we consider the problem of temporal action localization under low-shot (zero-shot & few-shot) scenario, with the goal of detecting and classifying the action instances from arbitrary categories within some untrimmed videos, even not seen at training time.
Classification of Primitive Manufacturing Tasks from Filtered Event Data
Several filters are compared and combined to remove event data noise.
Scaling Vision Transformers to 22 Billion Parameters
The scaling of Transformers has driven breakthrough capabilities for language models.
Deep Dependency Networks for Multi-Label Classification
We propose a simple approach which combines the strengths of probabilistic graphical models and deep learning architectures for solving the multi-label classification task, focusing specifically on image and video data.
Actor-Director-Critic: A Novel Deep Reinforcement Learning Framework
For the two critic networks used, we design two target critic networks for each critic network instead of one.
HierVL: Learning Hierarchical Video-Language Embeddings
Video-language embeddings are a promising avenue for injecting semantics into visual representations, but existing methods capture only short-term associations between seconds-long video clips and their accompanying text.
Hierarchical Explanations for Video Action Recognition
We propose Hierarchical ProtoPNet: an interpretable network that explains its reasoning process by considering the hierarchical relationship between classes.
Self-supervised and Weakly Supervised Contrastive Learning for Frame-wise Action Representations
In this paper, we introduce a new framework of contrastive action representation learning (CARL) to learn frame-wise action representation in a self-supervised or weakly-supervised manner, especially for long videos.
Spatio-Temporal Crop Aggregation for Video Representation Learning
We propose Spatio-temporal Crop Aggregation for video representation LEarning (SCALE), a novel method that enjoys high scalability at both training and inference time.