Deep High-Resolution Representation Learning for Visual Recognition

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams \emph{in parallel}; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at~{\url{https://github.com/HRNet}}.

PDF Abstract

Results from the Paper


 Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Instance Segmentation BDD100K val HRNet AP 22.5 # 2
Semantic Segmentation Cityscapes test HRNetV2 (train+val) Mean IoU (class) 81.6% # 38
Semantic Segmentation Cityscapes val HRNetV2 (HRNetV2-W48) mIoU 81.1 # 44
Semantic Segmentation Cityscapes val HRNetV2 (HRNetV2-W40) mIoU 80.2 # 54
Object Detection COCO minival Cascade R-CNN (HRNetV2p-W18) box AP 41.3 # 164
AP50 59.2 # 97
AP75 44.9 # 74
APS 23.7 # 69
APM 44.2 # 69
APL 54.1 # 72
Object Detection COCO minival HTC (HRNetV2p-W18) box AP 43.1 # 144
APS 26.6 # 42
APM 46.0 # 55
Object Detection COCO minival Mask R-CNN (HRNetV2p-W32) box AP 42.3 # 153
APS 25.0 # 59
APM 45.4 # 62
Object Detection COCO minival Cascade R-CNN (HRNetV2p-W48) box AP 44.6 # 124
AP50 62.7 # 69
AP75 48.7 # 42
APS 26.3 # 45
APM 48.1 # 39
APL 58.5 # 52
Object Detection COCO minival Cascade R-CNN (HRNetV2p-W32) box AP 43.7 # 136
AP50 61.7 # 76
AP75 47.7 # 49
APS 25.6 # 52
APM 46.5 # 52
APL 57.4 # 59
Object Detection COCO minival Faster R-CNN (HRNetV2p-W48) box AP 41.8 # 159
AP50 62.8 # 67
AP75 45.9 # 67
APM 44.7 # 64
APL 54.6 # 71
Object Detection COCO minival HTC (HRNetV2p-W32) box AP 45.3 # 113
APS 27.0 # 36
APM 48.4 # 32
APL 59.5 # 44
Instance Segmentation COCO minival HTC (HRNetV2p-W48) mask AP 41.0 # 68
Object Detection COCO minival Mask R-CNN (HRNetV2p-W18) box AP 39.2 # 189
APM 41.7 # 83
APL 51.0 # 85
Object Detection COCO minival Mask R-CNN (HRNetV2p-W32, cascade) APS 26.1 # 46
APM 47.9 # 42
Object Detection COCO minival Mask R-CNN (HRNetV2p-W48, cascade) box AP 46.0 # 108
APS 27.5 # 33
APL 60.1 # 40
Object Detection COCO minival HTC (HRNetV2p-W48) box AP 47.0 # 98
APS 28.8 # 26
APM 50.3 # 25
APL 62.2 # 31
Object Detection COCO minival Faster R-CNN (HRNetV2p-W18) box AP 38.0 # 198
AP50 58.9 # 101
AP75 41.5 # 94
APS 22.6 # 79
APM 40.8 # 86
APL 49.6 # 90
Object Detection COCO minival Faster R-CNN (HRNetV2p-W32) box AP 40.9 # 169
AP50 61.8 # 75
AP75 44.8 # 75
APS 24.4 # 63
APM 43.7 # 73
APL 53.3 # 75
Object Detection COCO test-dev Mask R-CNN (HRNetV2p-W48 + cascade) box mAP 46.1 # 127
AP50 64.0 # 96
AP75 50.3 # 80
APS 27.1 # 77
APM 48.6 # 74
APL 58.3 # 69
Hardware Burden 15G # 1
Operations per network pass 61.8G # 1
Object Detection COCO test-dev Cascade R-CNN (HRNetV2p-W48) AP75 48.6 # 86
APS 26.0 # 89
APM 47.3 # 85
APL 56.3 # 89
Object Detection COCO test-dev Faster R-CNN (HRNetV2p-W48) box mAP 42.4 # 172
AP50 63.6 # 100
AP75 46.4 # 113
APS 24.9 # 97
APM 44.6 # 115
APL 53.0 # 120
Hardware Burden 16G # 1
Operations per network pass 20.8G # 1
Object Detection COCO test-dev FCOS (HRNetV2p-W48) box mAP 40.5 # 190
AP50 59.3 # 140
APS 23.4 # 113
APM 42.6 # 130
APL 51.0 # 133
Hardware Burden 16G # 1
Operations per network pass 27.3G # 1
Object Detection COCO test-dev Mask R-CNN (HRNetV2p-W32 + cascade) AP50 62.5 # 111
AP75 48.6 # 86
APL 56.3 # 89
Hardware Burden 16G # 1
Operations per network pass 50.6G # 1
Object Detection COCO test-dev CenterNet (HRNetV2-W48) box mAP 43.5 # 154
AP75 46.5 # 111
APS 22.2 # 122
APL 57.8 # 74
Hardware Burden 16G # 1
Operations per network pass 21.7G # 1
Object Detection COCO test-dev HTC (HRNetV2p-W48) box mAP 47.3 # 117
AP50 65.9 # 73
AP75 51.2 # 67
APS 28.0 # 69
APM 49.7 # 65
APL 59.8 # 63
Hardware Burden 15G # 1
Operations per network pass 71.7G # 1
Face Alignment COFW-68 HRNetV2-W18 NME (inter-ocular) 5.06 # 6
Semantic Segmentation DADA-seg HRNet (ACDC) mIoU 27.5 # 9
Dichotomous Image Segmentation DIS-TE1 HRNet max F-Measure 0.668 # 11
weighted F-measure 0.579 # 10
MAE 0.088 # 10
S-Measure 0.742 # 10
E-measure 0.797 # 11
HCE 262 # 16
Dichotomous Image Segmentation DIS-TE2 HRNet max F-Measure 0.747 # 10
weighted F-measure 0.664 # 10
MAE 0.087 # 10
S-Measure 0.784 # 10
E-measure 0.840 # 7
HCE 555 # 14
Dichotomous Image Segmentation DIS-TE3 HRNet max F-Measure 0.784 # 10
weighted F-measure 0.700 # 10
MAE 0.080 # 10
S-Measure 0.805 # 9
E-measure 0.869 # 8
HCE 1049 # 13
Dichotomous Image Segmentation DIS-TE4 HRNet max F-Measure 0.772 # 10
weighted F-measure 0.687 # 10
MAE 0.092 # 10
S-Measure 0.792 # 10
E-measure 0.854 # 7
HCE 3864 # 20
Dichotomous Image Segmentation DIS-VD HRNet max F-Measure 0.726 # 12
weighted F-measure 0.641 # 10
MAE 0.095 # 11
S-Measure 0.767 # 12
E-measure 0.824 # 9
HCE 1560 # 14
Thermal Image Segmentation MFN Dataset HRNet mIOU 51.7 # 40
Semantic Segmentation PASCAL Context CFNet (ResNet-101) mIoU 54.0 # 35
Semantic Segmentation PASCAL Context HRNetV2 HRNetV2-W48 mIoU 54 # 35
Semantic Segmentation Potsdam HRNet-18 mIoU 84.02 # 2
Semantic Segmentation Potsdam HRNet-48 mIoU 84.22 # 1
Semantic Segmentation US3D HRNet-48 mIoU 72.66 # 2
Semantic Segmentation US3D HRNet-18 mIoU 60.33 # 3
Semantic Segmentation Vaihingen HRNet-48 mIoU 76.75 # 8
Semantic Segmentation Vaihingen HRNet-18 mIoU 75.90 # 10

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Uses Extra
Training Data
Source Paper Compare
Face Alignment 300W HRNet NME_inter-ocular (%, Full) 3.32 # 24
NME_inter-ocular (%, Common) 2.87 # 21
NME_inter-ocular (%, Challenge) 5.15 # 24
Face Alignment COFW HRNet NME (inter-ocular) 3.45 # 12
Face Alignment WFLW HRNet NME (inter-ocular) 4.60 # 21

Methods