SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

CVPR 2020 Xianzhi DuTsung-Yi LinPengchong JinGolnaz GhiasiMingxing TanYin CuiQuoc V. LeXiaodan Song

Convolutional neural networks typically encode an input image into a series of intermediate features with decreasing resolutions. While this structure is suited to classification tasks, it does not perform well for tasks requiring simultaneous recognition and localization (e.g., object detection)... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT LEADERBOARD
Real-Time Object Detection COCO SpineNet-49 (RetinaNet, single-scale, with TRT) MAP 46.7 # 4
FPS 29 # 10
inference time (ms) 34.3 # 2
Instance Segmentation COCO minival SpineNet-190 (Mask R-CNN, single-scale) mask AP 46.1 # 2
AP50 70.6 # 1
AP75 50.1 # 1
APL 63.6 # 1
APM 49.2 # 1
APS 27.5 # 1
Object Detection COCO minival SpineNet-190 (RetinaNet, single-scale) box AP 52.0 # 5
AP50 71.5 # 3
AP75 56.5 # 4
APS 37.4 # 1
APM 55.6 # 3
APL 65.0 # 4
Object Detection COCO minival SpineNet-190 (Mask R-CNN, single-scale) box AP 52.2 # 4
AP50 72.6 # 1
AP75 57.2 # 2
APS 33.9 # 4
APM 54.7 # 4
APL 68.8 # 1
Instance Segmentation COCO test-dev SpineNet-190 (Mask R-CNN, single-scale) mask AP 46.3 # 2
AP50 71.0 # 2
AP75 50.5 # 3
APS 29.7 # 3
APM 49.0 # 3
APL 58.5 # 5
Object Detection COCO test-dev SpineNet-190 (RetinaNet, single-scale) box AP 52.1 # 6
AP50 71.8 # 6
AP75 56.5 # 7
APS 35.4 # 5
APM 55.0 # 7
APL 63.6 # 11
Object Detection COCO test-dev SpineNet-190 (Mask R-CNN, single-scale) box AP 52.5 # 5
AP50 73.2 # 2
AP75 57.8 # 5
APS 36.0 # 2
APM 54.6 # 9
APL 63.9 # 9

Methods used in the Paper