Enhanced skeleton visualization for view invariant human action recognition
Human action recognition based on skeletons has wide applications in human–computer interaction and intelligent surveillance. However, view variations and noisy data bring challenges to this task. What’s more, it remains a problem to effectively represent spatio-temporal skeleton sequences. To solve these problems in one goal, this work presents an enhanced skeleton visualization method for view invariant human action recognition. Our method consists of three stages. First, a sequence-based view invariant transform is developed to eliminate the effect of view variations on spatio-temporal locations of skeleton joints. Second, the transformed skeletons are visualized as a series of color images, which implicitly encode the spatio-temporal information of skeleton joints. Furthermore, visual and motion enhancement methods are applied on color images to enhance their local patterns. Third, a convolutional neural networks-based model is adopted to extract robust and discriminative features from color images. The final action class scores are generated by decision level fusion of deep features. Extensive experiments on four challenging datasets consistently demonstrate the superiority of our method.
PDF AbstractDatasets
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Skeleton Based Action Recognition | NTU RGB+D | Skeleton Visualization | Accuracy (CV) | 87.2 | # 100 | |
Accuracy (CS) | 80.0 | # 103 | ||||
Ensembled Modalities | 4 | # 2 | ||||
Skeleton Based Action Recognition | NTU RGB+D 120 | Skeleton Visualization (Single Stream) | Accuracy (Cross-Subject) | 60.3% | # 69 | |
Accuracy (Cross-Setup) | 63.2% | # 64 | ||||
Skeleton Based Action Recognition | UWA3D | ESV (Synthesized + Pre-trained) | Accuracy | 73.8% | # 2 | |
Skeleton Based Action Recognition | Varying-view RGB-D Action-Skeleton | SK-CNN | Accuracy (CS) | 59% | # 5 | |
Accuracy (CV I) | 26% | # 2 | ||||
Accuracy (CV II) | 68% | # 2 | ||||
Accuracy (AV I) | 43% | # 4 | ||||
Accuracy (AV II) | 77% | # 1 |