Three-Stream Convolutional Neural Network With Multi-Task and Ensemble Learning for 3D Action Recognition

In this paper, we propose a three-stream convolutional neural network (3SCNN) for action recognition from skeleton sequences, which aims to thoroughly and fully exploit the skeleton data by extracting, learning, fusing and inferring multiple motion-related features, including 3D joint positions and joint displacements across adjacent frames as well as oriented bone segments. The proposed 3SCNN involves three sequential stages. The first stage enriches three independently extracted features by co-occurrence feature learning. The second stage involves multi-channel pairwise fusion to take advantage of the complementary and diverse nature among three features. The third stage is a multi-task and ensemble learning network to further improve the generalization ability of 3SCNN. Experimental results on the standard dataset show the effectiveness of our proposed multi-stream feature learning, fusion and inference method for skeleton-based 3D action recognition.

PDF Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Skeleton Based Action Recognition NTU RGB+D 3SCNN Accuracy (CV) 93.7 # 60
Accuracy (CS) 88.6 # 54

Methods


No methods listed for this paper. Add relevant methods here