Deep High-Resolution Representation Learning for Visual Recognition

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams \emph{in parallel}; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at~{\url{https://github.com/HRNet}}.

PDF Abstract

Results from the Paper


 Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Instance Segmentation BDD100K val HRNet AP 22.5 # 2
Semantic Segmentation Cityscapes test HRNetV2 (train+val) Mean IoU (class) 81.6% # 37
Semantic Segmentation Cityscapes val HRNetV2 (HRNetV2-W40) mIoU 80.2 # 47
Semantic Segmentation Cityscapes val HRNetV2 (HRNetV2-W48) mIoU 81.1 # 39
Object Detection COCO minival Cascade R-CNN (HRNetV2p-W48) box AP 44.6 # 115
AP50 62.7 # 57
AP75 48.7 # 35
APS 26.3 # 39
APM 48.1 # 28
APL 58.5 # 40
Object Detection COCO minival Cascade R-CNN (HRNetV2p-W18) box AP 41.3 # 151
AP50 59.2 # 85
AP75 44.9 # 62
APS 23.7 # 59
APM 44.2 # 57
APL 54.1 # 60
Object Detection COCO minival Faster R-CNN (HRNetV2p-W32) box AP 40.9 # 156
AP50 61.8 # 63
AP75 44.8 # 63
APS 24.4 # 54
APM 43.7 # 61
APL 53.3 # 63
Object Detection COCO minival Faster R-CNN (HRNetV2p-W18) box AP 38.0 # 185
AP50 58.9 # 89
AP75 41.5 # 82
APS 22.6 # 67
APM 40.8 # 74
APL 49.6 # 78
Object Detection COCO minival HTC (HRNetV2p-W48) box AP 47.0 # 93
APS 28.8 # 22
APM 50.3 # 20
APL 62.2 # 23
Object Detection COCO minival Mask R-CNN (HRNetV2p-W48, cascade) box AP 46.0 # 102
APS 27.5 # 28
APL 60.1 # 28
Object Detection COCO minival Mask R-CNN (HRNetV2p-W32, cascade) APS 26.1 # 40
APM 47.9 # 31
Object Detection COCO minival Mask R-CNN (HRNetV2p-W18) box AP 39.2 # 176
APM 41.7 # 71
APL 51.0 # 73
Object Detection COCO minival HTC (HRNetV2p-W32) box AP 45.3 # 106
APS 27.0 # 30
APM 48.4 # 24
APL 59.5 # 32
Instance Segmentation COCO minival HTC (HRNetV2p-W48) mask AP 41.0 # 65
Object Detection COCO minival Faster R-CNN (HRNetV2p-W48) box AP 41.8 # 146
AP50 62.8 # 55
AP75 45.9 # 56
APM 44.7 # 52
APL 54.6 # 59
Object Detection COCO minival Cascade R-CNN (HRNetV2p-W32) box AP 43.7 # 124
AP50 61.7 # 64
AP75 47.7 # 41
APS 25.6 # 45
APM 46.5 # 40
APL 57.4 # 47
Object Detection COCO minival HTC (HRNetV2p-W18) box AP 43.1 # 132
APS 26.6 # 36
APM 46.0 # 43
Object Detection COCO minival Mask R-CNN (HRNetV2p-W32) box AP 42.3 # 140
APS 25.0 # 51
APM 45.4 # 50
Object Detection COCO test-dev Mask R-CNN (HRNetV2p-W32 + cascade) AP50 62.5 # 108
AP75 48.6 # 83
APL 56.3 # 88
Hardware Burden 16G # 1
Operations per network pass 50.6G # 1
Object Detection COCO test-dev HTC (HRNetV2p-W48) box mAP 47.3 # 114
AP50 65.9 # 69
AP75 51.2 # 63
APS 28.0 # 66
APM 49.7 # 62
APL 59.8 # 60
Hardware Burden 15G # 1
Operations per network pass 71.7G # 1
Object Detection COCO test-dev Cascade R-CNN (HRNetV2p-W48) AP75 48.6 # 83
APS 26.0 # 87
APM 47.3 # 83
APL 56.3 # 88
Object Detection COCO test-dev Faster R-CNN (HRNetV2p-W48) box mAP 42.4 # 170
AP50 63.6 # 97
AP75 46.4 # 110
APS 24.9 # 95
APM 44.6 # 113
APL 53.0 # 119
Hardware Burden 16G # 1
Operations per network pass 20.8G # 1
Object Detection COCO test-dev FCOS (HRNetV2p-W48) box mAP 40.5 # 187
AP50 59.3 # 137
APS 23.4 # 111
APM 42.6 # 128
APL 51.0 # 132
Hardware Burden 16G # 1
Operations per network pass 27.3G # 1
Object Detection COCO test-dev CenterNet (HRNetV2-W48) box mAP 43.5 # 152
AP75 46.5 # 108
APS 22.2 # 120
APL 57.8 # 71
Hardware Burden 16G # 1
Operations per network pass 21.7G # 1
Object Detection COCO test-dev Mask R-CNN (HRNetV2p-W48 + cascade) box mAP 46.1 # 124
AP50 64.0 # 93
AP75 50.3 # 76
APS 27.1 # 74
APM 48.6 # 71
APL 58.3 # 66
Hardware Burden 15G # 1
Operations per network pass 61.8G # 1
Face Alignment COFW-68 HRNetV2-W18 NME (inter-ocular) 5.06 # 6
Semantic Segmentation DADA-seg HRNet (ACDC) mIoU 27.5 # 9
Dichotomous Image Segmentation DIS-TE1 HRNet max F-Measure 0.668 # 9
weighted F-measure 0.579 # 8
MAE 0.088 # 8
S-Measure 0.742 # 8
E-measure 0.797 # 9
HCE 262 # 15
Dichotomous Image Segmentation DIS-TE2 HRNet max F-Measure 0.747 # 8
weighted F-measure 0.664 # 8
MAE 0.087 # 8
S-Measure 0.784 # 8
E-measure 0.840 # 5
HCE 555 # 13
Dichotomous Image Segmentation DIS-TE3 HRNet max F-Measure 0.784 # 8
weighted F-measure 0.700 # 8
MAE 0.080 # 8
S-Measure 0.805 # 7
E-measure 0.869 # 6
HCE 1049 # 12
Dichotomous Image Segmentation DIS-TE4 HRNet max F-Measure 0.772 # 8
weighted F-measure 0.687 # 8
MAE 0.092 # 8
S-Measure 0.792 # 8
E-measure 0.854 # 5
HCE 3864 # 19
Dichotomous Image Segmentation DIS-VD HRNet max F-Measure 0.726 # 8
weighted F-measure 0.641 # 6
MAE 0.095 # 7
S-Measure 0.767 # 8
E-measure 0.824 # 5
HCE 1560 # 13
Thermal Image Segmentation MFN Dataset HRNet mIOU 51.7 # 33
Semantic Segmentation PASCAL Context CFNet (ResNet-101) mIoU 54.0 # 32
Semantic Segmentation PASCAL Context HRNetV2 HRNetV2-W48 mIoU 54 # 32
Semantic Segmentation SynPASS HRNet mIoU 34.09% # 4

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Face Alignment 300W HRNet NME_inter-ocular (%, Full) 3.32 # 24
NME_inter-ocular (%, Common) 2.87 # 21
NME_inter-ocular (%, Challenge) 5.15 # 24
Face Alignment COFW HRNet NME (inter-ocular) 3.45 # 12
Face Alignment WFLW HRNet NME (inter-ocular) 4.60 # 21

Methods