Over the last decade, Convolutional Neural Network (CNN) models have been highly successful in solving complex vision problems.
Our experiments show that (a) state-of-the-art 3D convolutional neural networks obtain disappointing results on such videos, highlighting the lack of true understanding of the human actions and (b) models leveraging body language via human pose are less prone to context biases.
In this work, we propose to use a new class of models known as Temporal Convolutional Neural Networks (TCN) for 3D human action recognition.
Recent approaches in depth-based human activity analysis achieved outstanding performance and proved the effectiveness of 3D representation for classification of action classes.
#4 best model for Skeleton Based Action Recognition on Varying-view RGB-D Action-Skeleton
Each available 3DV voxel intrinsically involves 3D spatial and motion feature jointly.
The proposed representation has the advantage of combining the use of reference joints and a tree structure skeleton.
#3 best model for Action Recognition In Videos on NTU RGB+D 120
Due to the availability of large-scale skeleton datasets, 3D human action recognition has recently called the attention of computer vision community.
#4 best model for Action Recognition In Videos on NTU RGB+D 120
The proposed method achieved state-of-the-art performance on NTU RGB+D dataset for 3D human action analysis.
#50 best model for Skeleton Based Action Recognition on NTU RGB+D