Lite-HRNet: A Lightweight High-Resolution Network

We present an efficient high-resolution network, Lite-HRNet, for human pose estimation. We start by simply applying the efficient shuffle block in ShuffleNet to HRNet (high-resolution network), yielding stronger performance over popular lightweight networks, such as MobileNet, ShuffleNet, and Small HRNet. We find that the heavily-used pointwise (1x1) convolutions in shuffle blocks become the computational bottleneck. We introduce a lightweight unit, conditional channel weighting, to replace costly pointwise (1x1) convolutions in shuffle blocks. The complexity of channel weighting is linear w.r.t the number of channels and lower than the quadratic time complexity for pointwise convolutions. Our solution learns the weights from all the channels and over multiple resolutions that are readily available in the parallel branches in HRNet. It uses the weights as the bridge to exchange information across channels and resolutions, compensating the role played by the pointwise (1x1) convolution. Lite-HRNet demonstrates superior results on human pose estimation over popular lightweight networks. Moreover, Lite-HRNet can be easily applied to semantic segmentation task in the same lightweight manner. The code and models have been publicly available at

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper

Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Pose Estimation COCO test-dev Lite-HRNet-30 AP 69.7 # 33
AP50 90.7 # 27
AP75 77.5 # 29
APL 75.0 # 29
APM 66.9 # 26
AR 75.4 # 27
Pose Estimation COCO test-dev Lite-HRNet-18 AP 66.9 # 36
AP50 89.4 # 32
AP75 74.4 # 32
APL 72.2 # 31
APM 64.0 # 30
AR 72.6 # 28