Deep High-Resolution Representation Learning for Visual Recognition

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions \emph{in series} (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation. Instead, our proposed network, named as High-Resolution Network (HRNet), maintains high-resolution representations through the whole process. There are two key characteristics: (i) Connect the high-to-low resolution convolution streams \emph{in parallel}; (ii) Repeatedly exchange the information across resolutions. The benefit is that the resulting representation is semantically richer and spatially more precise. We show the superiority of the proposed HRNet in a wide range of applications, including human pose estimation, semantic segmentation, and object detection, suggesting that the HRNet is a stronger backbone for computer vision problems. All the codes are available at~{\url{https://github.com/HRNet}}.

PDF Abstract

Results from the Paper


 Ranked #1 on Object Detection on COCO test-dev (Hardware Burden metric)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Instance Segmentation BDD100K HRNet AP 22.5 # 2
Instance Segmentation BDD100K val HRNet AP 22.5 # 2
Semantic Segmentation Cityscapes test HRNetV2 (train+val) Mean IoU (class) 81.6 # 30
Semantic Segmentation Cityscapes val HRNetV2 (HRNetV2-W48) mIoU 81.1 # 19
Object Detection COCO minival HTC (HRNetV2p-W48) box AP 47.0 # 57
APS 28.8 # 18
APM 50.3 # 16
APL 62.2 # 16
Object Detection COCO minival Mask R-CNN (HRNetV2p-W48, cascade) box AP 46.0 # 65
APS 27.5 # 24
APL 60.1 # 20
Object Detection COCO minival Mask R-CNN (HRNetV2p-W32, cascade) APS 26.1 # 36
APM 47.9 # 27
Object Detection COCO minival Mask R-CNN (HRNetV2p-W18) box AP 39.2 # 138
APM 41.7 # 67
APL 51.0 # 65
Instance Segmentation COCO minival HTC (HRNetV2p-W48) mask AP 41.0 # 40
Object Detection COCO minival Faster R-CNN (HRNetV2p-W48) box AP 41.8 # 109
AP50 62.8 # 44
AP75 45.9 # 49
APM 44.7 # 48
APL 54.6 # 51
Object Detection COCO minival Cascade R-CNN (HRNetV2p-W48) box AP 44.6 # 78
AP50 62.7 # 46
AP75 48.7 # 28
APS 26.3 # 35
APM 48.1 # 24
APL 58.5 # 32
Object Detection COCO minival Cascade R-CNN (HRNetV2p-W18) box AP 41.3 # 114
AP50 59.2 # 74
AP75 44.9 # 55
APS 23.7 # 55
APM 44.2 # 53
APL 54.1 # 52
Object Detection COCO minival Faster R-CNN (HRNetV2p-W32) box AP 40.9 # 118
AP50 61.8 # 52
AP75 44.8 # 56
APS 24.4 # 50
APM 43.7 # 57
APL 53.3 # 55
Object Detection COCO minival Faster R-CNN (HRNetV2p-W18) box AP 38.0 # 147
AP50 58.9 # 78
AP75 41.5 # 74
APS 22.6 # 63
APM 40.8 # 70
APL 49.6 # 70
Object Detection COCO minival HTC (HRNetV2p-W32) box AP 45.3 # 69
APS 27.0 # 26
APM 48.4 # 20
APL 59.5 # 24
Object Detection COCO minival Cascade R-CNN (HRNetV2p-W32) box AP 43.7 # 87