Learning Spatial Fusion for Single-Shot Object Detection

21 Nov 2019  ·  Songtao Liu, Di Huang, Yunhong Wang ·

Pyramidal feature representation is the common practice to address the challenge of scale variation in object detection. However, the inconsistency across different feature scales is a primary limitation for the single-shot detectors based on feature pyramid. In this work, we propose a novel and data driven strategy for pyramidal feature fusion, referred to as adaptively spatial feature fusion (ASFF). It learns the way to spatially filter conflictive information to suppress the inconsistency, thus improving the scale-invariance of features, and introduces nearly free inference overhead. With the ASFF strategy and a solid baseline of YOLOv3, we achieve the best speed-accuracy trade-off on the MS COCO dataset, reporting 38.1% AP at 60 FPS, 42.4% AP at 45 FPS and 43.9% AP at 29 FPS. The code is available at https://github.com/ruinmessi/ASFF

PDF Abstract


Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Object Detection COCO test-dev YOLOv3 @800 + ASFF* (Darknet-53) box mAP 43.9 # 136
AP50 64.1 # 82
AP75 49.2 # 73
APS 27.0 # 71
APM 46.6 # 85
APL 53.4 # 106
Hardware Burden None # 1
Operations per network pass None # 1