DETRs with Collaborative Hybrid Assignments Training

ICCV 2023  ·  Zhuofan Zong, Guanglu Song, Yu Liu ·

In this paper, we provide the observation that too few queries assigned as positive samples in DETR with one-to-one set matching leads to sparse supervision on the encoder's output which considerably hurt the discriminative feature learning of the encoder and vice visa for attention learning in the decoder. To alleviate this, we present a novel collaborative hybrid assignments training scheme, namely $\mathcal{C}$o-DETR, to learn more efficient and effective DETR-based detectors from versatile label assignment manners. This new training scheme can easily enhance the encoder's learning ability in end-to-end detectors by training the multiple parallel auxiliary heads supervised by one-to-many label assignments such as ATSS and Faster RCNN. In addition, we conduct extra customized positive queries by extracting the positive coordinates from these auxiliary heads to improve the training efficiency of positive samples in the decoder. In inference, these auxiliary heads are discarded and thus our method introduces no additional parameters and computational cost to the original detector while requiring no hand-crafted non-maximum suppression (NMS). We conduct extensive experiments to evaluate the effectiveness of the proposed approach on DETR variants, including DAB-DETR, Deformable-DETR, and DINO-Deformable-DETR. The state-of-the-art DINO-Deformable-DETR with Swin-L can be improved from 58.5% to 59.5% AP on COCO val. Surprisingly, incorporated with ViT-L backbone, we achieve 66.0% AP on COCO test-dev and 67.9% AP on LVIS val, outperforming previous methods by clear margins with much fewer model sizes. Codes are available at \url{https://github.com/Sense-X/Co-DETR}.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Results from the Paper


 Ranked #1 on Object Detection on LVIS v1.0 val (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Object Detection COCO minival Co-DETR box AP 65.9 # 1
Params (M) 348 # 2
Object Detection COCO minival Co-DETR (Swin-L) box AP 64.7 # 4
Params (M) 218 # 1
Object Detection COCO test-dev Co-DETR box mAP 66.0 # 1
Params (M) 348 # 5
Object Detection COCO test-dev Co-DETR (Swin-L) box mAP 64.8 # 5
Params (M) 218 # 6
Object Detection LVIS v1.0 minival Co-DETR (single-scale) box AP 72.0 # 1
Instance Segmentation LVIS v1.0 val Co-DETR (single-scale) mask AP 56.0 # 1
Object Detection LVIS v1.0 val Co-DETR (single-scale) box AP 68.0 # 1

Methods