A Strong and Reproducible Object Detector with Only Public Datasets

This work presents Focal-Stable-DINO, a strong and reproducible object detection model which achieves 64.6 AP on COCO val2017 and 64.8 AP on COCO test-dev using only 700M parameters without any test time augmentation. It explores the combination of the powerful FocalNet-Huge backbone with the effective Stable-DINO detector. Different from existing SOTA models that utilize an extensive number of parameters and complex training techniques on large-scale private data or merged data, our model is exclusively trained on the publicly available dataset Objects365, which ensures the reproducibility of our approach.

PDF Abstract

Results from the Paper


Ranked #5 on Object Detection on COCO minival (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Object Detection COCO minival Focal-Stable-DINO (without TTA) box AP 64.6 # 5
Object Detection COCO test-dev Focal-Stable-DINO (without TTA) box mAP 64.8 # 5

Methods