Deep High-Resolution Representation Learning for Visual Recognition

20 Aug 2019Jingdong WangKe SunTianheng ChengBorui JiangChaorui DengYang ZhaoDong LiuYadong MuMingkui TanXinggang WangWenyu LiuBin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation... (read more)

PDF Abstract

Evaluation Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK COMPARE
Object Detection COCO test-dev HTC (HRNetV2p-W48) box AP 47.3 # 4
Object Detection COCO test-dev HTC (HRNetV2p-W48) AP50 65.9 # 9
Object Detection COCO test-dev HTC (HRNetV2p-W48) AP75 51.2 # 6
Object Detection COCO test-dev HTC (HRNetV2p-W48) APS 28.0 # 10
Object Detection COCO test-dev HTC (HRNetV2p-W48) APM 49.7 # 5
Object Detection COCO test-dev HTC (HRNetV2p-W48) APL 59.8 # 4
Semantic Segmentation PASCAL Context HRNetV2 (HRNetV2-W48) mIoU 54.0 # 3