Generalized Focal Loss V2: Learning Reliable Localization Quality Estimation for Dense Object Detection

Localization Quality Estimation (LQE) is crucial and popular in the recent advancement of dense object detectors since it can provide accurate ranking scores that benefit the Non-Maximum Suppression processing and improve detection performance. As a common practice, most existing methods predict LQE scores through vanilla convolutional features shared with object classification or bounding box regression. In this paper, we explore a completely novel and different perspective to perform LQE -- based on the learned distributions of the four parameters of the bounding box. The bounding box distributions are inspired and introduced as "General Distribution" in GFLV1, which describes the uncertainty of the predicted bounding boxes well. Such a property makes the distribution statistics of a bounding box highly correlated to its real localization quality. Specifically, a bounding box distribution with a sharp peak usually corresponds to high localization quality, and vice versa. By leveraging the close correlation between distribution statistics and the real localization quality, we develop a considerably lightweight Distribution-Guided Quality Predictor (DGQP) for reliable LQE based on GFLV1, thus producing GFLV2. To our best knowledge, it is the first attempt in object detection to use a highly relevant, statistical representation to facilitate LQE. Extensive experiments demonstrate the effectiveness of our method. Notably, GFLV2 (ResNet-101) achieves 46.2 AP at 14.6 FPS, surpassing the previous state-of-the-art ATSS baseline (43.6 AP at 14.6 FPS) by absolute 2.6 AP on COCO {\tt test-dev}, without sacrificing the efficiency both in training and inference. Code will be available at https://github.com/implus/GFocalV2.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Object Detection COCO test-dev GFLV2 (Res2Net-101, DCN) box AP 50.6 # 58
AP50 69 # 44
AP75 55.3 # 39
APS 31.3 # 44
APM 54.3 # 36
APL 63.5 # 37
Hardware Burden None # 1
Operations per network pass None # 1
Object Detection COCO test-dev GFLV2 (ResNet-50) box AP 44.3 # 113
AP50 62.3 # 108
AP75 48.5 # 85
APS 26.8 # 83
APM 47.7 # 83
APL 54.1 # 112
Hardware Burden None # 1
Operations per network pass None # 1
Object Detection COCO test-dev GFLV2 (Res2Net-101, DCN, multiscale) box AP 53.3 # 37
AP50 70.9 # 28
AP75 59.2 # 19
APS 35.7 # 18
APM 56.1 # 25
APL 65.6 # 25
Hardware Burden None # 1
Operations per network pass None # 1
Object Detection COCO test-dev GFLV2 (ResNeXt-101, 32x4d, DCN) box AP 49 # 67
AP50 67.6 # 54
AP75 53.5 # 46
APS 29.7 # 54
APM 52.4 # 46
APL 61.4 # 50
Hardware Burden 3G # 1
Operations per network pass None # 1
Object Detection COCO test-dev GFLV2 (ResNet-101-DCN) box AP 48.3 # 75
AP50 66.5 # 65
AP75 52.8 # 52
APS 28.8 # 65
APM 51.9 # 50
APL 60.7 # 55
Hardware Burden 3G # 1
Operations per network pass None # 1
Object Detection COCO test-dev GFLV2 (ResNet-101) box AP 46.2 # 94
AP50 64.3 # 85
AP75 50.5 # 72
APS 27.8 # 70
APM 49.9 # 63
APL 57 # 84
Hardware Burden None # 1
Operations per network pass None # 1

Methods