Simple Training Strategies and Model Scaling for Object Detection

30 Jun 2021  ·  Xianzhi Du, Barret Zoph, Wei-Chih Hung, Tsung-Yi Lin ·

The speed-accuracy Pareto curve of object detection systems have advanced through a combination of better model architectures, training and inference methods. In this paper, we methodically evaluate a variety of these techniques to understand where most of the improvements in modern detection systems come from. We benchmark these improvements on the vanilla ResNet-FPN backbone with RetinaNet and RCNN detectors. The vanilla detectors are improved by 7.7% in accuracy while being 30% faster in speed. We further provide simple scaling strategies to generate family of models that form two Pareto curves, named RetinaNet-RS and Cascade RCNN-RS. These simple rescaled detectors explore the speed-accuracy trade-off between the one-stage RetinaNet detectors and two-stage RCNN detectors. Our largest Cascade RCNN-RS models achieve 52.9% AP with a ResNet152-FPN backbone and 53.6% with a SpineNet143L backbone. Finally, we show the ResNet architecture, with three minor architectural changes, outperforms EfficientNet as the backbone for object detection and instance segmentation systems.

PDF Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Object Detection COCO minival Cascade RCNN-RS (SpineNet-143L, single scale) box AP 53.6 # 57
APS 34.5 # 11
APM 56.7 # 10
APL 70.6 # 8
Object Detection COCO minival Cascade RCNN-RS (ResNet-200, single scale) box AP 53.1 # 60
APS 33.9 # 12
APM 56.2 # 12
APL 70.3 # 9

Methods