Dance with Flow: Two-in-One Stream Action Detection

CVPR 2019  ·  Jiaojiao Zhao, Cees G. M. Snoek ·

The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, which is leveraged by the motion modulation layer to generate transformation parameters for modulating the low-level RGB features. The method is easily embedded in existing appearance- or two-stream action detection networks, and trained end-to-end. Experiments demonstrate that leveraging the motion condition to modulate RGB features improves detection accuracy. With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.

PDF Abstract CVPR 2019 PDF CVPR 2019 Abstract

Results from the Paper


 Ranked #1 on Action Detection on UCF Sports (Video-mAP 0.5 metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Action Detection J-HMDB Two-in-one Video-mAP 0.5 57.96 # 14
Action Detection J-HMDB Two-in-one Two Stream Video-mAP 0.5 74.74 # 9
Action Recognition UCF101 two-in-one two stream 3-fold Accuracy 92 # 62
Action Detection UCF101-24 Two-in-one Video-mAP 0.2 75.48 # 11
Video-mAP 0.5 48.31 # 12
Action Detection UCF101-24 Two-in-one Two Stream Video-mAP 0.2 78.48 # 7
Video-mAP 0.5 50.30 # 9
Action Detection UCF Sports Two-in-one Video-mAP 0.5 92.74 # 5
Action Detection UCF Sports Two-in-one Two Stream Video-mAP 0.5 96.52 # 1

Methods


No methods listed for this paper. Add relevant methods here