Dance with Flow: Two-in-One Stream Action Detection
The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, which is leveraged by the motion modulation layer to generate transformation parameters for modulating the low-level RGB features. The method is easily embedded in existing appearance- or two-stream action detection networks, and trained end-to-end. Experiments demonstrate that leveraging the motion condition to modulate RGB features improves detection accuracy. With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.
PDF Abstract CVPR 2019 PDF CVPR 2019 AbstractDatasets
Results from the Paper
Ranked #1 on Action Detection on UCF Sports (Video-mAP 0.5 metric)
Task | Dataset | Model | Metric Name | Metric Value | Global Rank | Benchmark |
---|---|---|---|---|---|---|
Action Detection | J-HMDB | Two-in-one | Video-mAP 0.5 | 57.96 | # 14 | |
Action Detection | J-HMDB | Two-in-one Two Stream | Video-mAP 0.5 | 74.74 | # 9 | |
Action Recognition | UCF101 | two-in-one two stream | 3-fold Accuracy | 92 | # 62 | |
Action Detection | UCF101-24 | Two-in-one | Video-mAP 0.2 | 75.48 | # 11 | |
Video-mAP 0.5 | 48.31 | # 12 | ||||
Action Detection | UCF101-24 | Two-in-one Two Stream | Video-mAP 0.2 | 78.48 | # 7 | |
Video-mAP 0.5 | 50.30 | # 9 | ||||
Action Detection | UCF Sports | Two-in-one | Video-mAP 0.5 | 92.74 | # 5 | |
Action Detection | UCF Sports | Two-in-one Two Stream | Video-mAP 0.5 | 96.52 | # 1 |