Hardware-Aware Network Transformation

29 Sep 2021 · Pavlo Molchanov, Jimmy Hall, Hongxu Yin, Jan Kautz, Nicolo Fusi, Arash Vahdat ·

In this paper, we tackle the problem of network acceleration by proposing hardware-aware network transformation (HANT), an approach that builds on neural architecture search techniques and teacher-student distillation. HANT consists of two phases: in the first phase, it trains many alternative operations for every layer of the teacher network using layer-wise feature map distillation. In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization approach. In extensive experiments, we show that HANT can successfully accelerate three different families of network architectures (EfficientNetsV1, EfficientNetsV2 and ResNests), over two different target hardware platforms with minimal loss of accuracy. For example, HANT accelerates EfficientNetsV1-B6 by 3.6 with <0.4% drop in top-1 accuracy on ImageNet. When comparing the same latency level, HANT can accelerate EfficientNetV1-B4 to the same latency as EfficientNetV1-B1 while achieving 3% higher accuracy. We also show that applying HANT to EfficientNetV1 results in the automated discovery of the same (qualitative) architecture modifications later incorporated in EfficientNetV2. Finally, HANT’s efficient search allows us to examine a large pool of 197 operations per layer, resulting in new insights into the accuracy-latency tradeoffs for different operations.

PDF Abstract