Action Classification
227 papers with code • 24 benchmarks • 30 datasets
Image source: The Kinetics Human Action Video Dataset
Libraries
Use these libraries to find Action Classification models and implementationsDatasets
Latest papers with no code
After-Stroke Arm Paresis Detection using Kinematic Data
This paper presents an approach for detecting unilateral arm paralysis/weakness using kinematic data.
Proposal-based Temporal Action Localization with Point-level Supervision
Point-level supervised temporal action localization (PTAL) aims at recognizing and localizing actions in untrimmed videos where only a single point (frame) within every action instance is annotated in training data.
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild
It first models the intra-person skeleton dynamics for each skeleton sequence with graph convolutions, and then uses stacked Transformer encoders to capture person interactions that are important for action recognition in general scenarios.
Semi Supervised Meta Learning for Spatiotemporal Learning
Broadly, we seek to understand the impact of applying meta-learning to existing state-of-the-art representation learning architectures.
Spiking Two-Stream Methods with Unsupervised STDP-based Learning for Action Recognition
Implementing this model with unsupervised STDP-based CSNNs allows us to further study the performance of these networks with video analysis.
How Object Information Improves Skeleton-based Human Action Recognition in Assembly Tasks
Our research sheds light on the benefits of combining skeleton joints with object information for human action recognition in assembly tasks.
Human Action Recognition in Egocentric Perspective Using 2D Object and Hands Pose
Egocentric action recognition is essential for healthcare and assistive technology that relies on egocentric cameras because it allows for the automatic and continuous monitoring of activities of daily living (ADLs) without requiring any conscious effort from the user.
Self-Supervised Video Representation Learning via Latent Time Navigation
Self-supervised video representation learning aimed at maximizing similarity between different temporal segments of one video, in order to enforce feature persistence over time.
AssemblyHands: Towards Egocentric Activity Understanding via 3D Hand Pose Estimation
To obtain high-quality 3D hand pose annotations for the egocentric images, we develop an efficient pipeline, where we use an initial set of manual annotations to train a model to automatically annotate a much larger dataset.
VicTR: Video-conditioned Text Representations for Activity Recognition
In this paper, we argue the contrary, that better video-VLMs can be designed by focusing more on augmenting text, rather than visual information.