R-STAN: Residual Spatial-Temporal Attention Network for Action Recognition
Two-stream network architecture has the ability to capture temporal and spatial features from videos simultaneously and has achieved excellent performance on video action recognition tasks. However, there is a fair amount of redundant information in both temporal and spatial dimensions in videos, which increases the complexity of network learning. To solve this problem, we propose residual spatial-temporal attention network (R-STAN), a feed-forward convolutional neural network using residual learning and spatial-temporal attention mechanism for video action recognition, which makes the network focus more on discriminative temporal and spatial features. In our R-STAN, each stream is constructed by stacking residual spatial-temporal attention blocks (R-STAB), the spatial-temporal attention modules integrated in the residual blocks have the ability to generate attention-aware features along temporal and spatial dimensions, which largely reduce the redundant information. Together with the specific characteristic of residual learning, we are able to construct a very deep network for learning spatial-temporal information in videos. With the layers going deeper, the attention-aware features from the different R-STABs can change adaptively. We validate our R-STAN through a large number of experiments on UCF101 and HMDB51 datasets. Our experiments show that our proposed network combined with residual learning and spatial-temporal attention mechanism contributes substantially to the performance of video action recognition.
PDF AbstractTask | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Action Recognition | HMDB-51 | R-STAN-152 | Average accuracy of 3 splits | 55.16 | # 70 | |
Action Recognition | HMDB-51 | R-STAN-50 | Average accuracy of 3 splits | 62.8 | # 64 | |
Action Recognition | UCF101 | R-STAN-50 | 3-fold Accuracy | 91.5 | # 64 | |
Action Recognition | UCF101 | R-STAN-101 | 3-fold Accuracy | 94.5 | # 49 |