3D Action Recognition
34 papers with code • 3 benchmarks • 14 datasets
Image: Rahmani et al
Libraries
Use these libraries to find 3D Action Recognition models and implementationsDatasets
Subtasks
Latest papers
On the Utility of 3D Hand Poses for Action Recognition
3D hand poses are an under-explored modality for action recognition.
Masked Motion Predictors are Strong 3D Action Representation Learners
To be specific, the proposed MAMP takes as input the masked spatio-temporal skeleton sequence and predicts the corresponding temporal motion of the masked human joints.
Interactive Spatiotemporal Token Attention Network for Skeleton-based General Interactive Action Recognition
To address these problems, we propose an Interactive Spatiotemporal Token Attention Network (ISTA-Net), which simultaneously model spatial, temporal, and interactive relations.
CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation
In this work, we formulate the cross-modal interaction as a bidirectional knowledge distillation problem.
Collaborating Domain-shared and Target-specific Feature Clustering for Cross-domain 3D Action Recognition
Furthermore, to leverage the complementarity of domain-shared features and target-specific features, we propose a novel collaborative clustering strategy to enforce pair-wise relationship consistency between the two branches.
Multi-Scale Spatial Temporal Graph Convolutional Network for Skeleton-Based Action Recognition
To solve this problem, we present a multi-scale spatial graph convolution (MS-GC) module and a multi-scale temporal graph convolution (MT-GC) module to enrich the receptive field of the model in spatial and temporal dimensions.
PSTNet: Point Spatio-Temporal Convolution on Point Cloud Sequences
Then, a spatial convolution is employed to capture the local structure of points in the 3D space, and a temporal convolution is used to model the dynamics of the spatial regions along the time dimension.
Assembly101: A Large-Scale Multi-View Video Dataset for Understanding Procedural Activities
Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles.
No Pain, Big Gain: Classify Dynamic Point Cloud Sequences with Static Models by Fitting Feature-level Space-time Surfaces
Scene flow is a powerful tool for capturing the motion field of 3D point clouds.
Domain Knowledge-Informed Self-Supervised Representations for Workout Form Assessment
To that end, we propose to learn exercise-oriented image and video representations from unlabeled samples such that a small dataset annotated by experts suffices for supervised error detection.