Fast and Efficient Once-For-All Networks for Diverse Hardware Deployment

29 Sep 2021 · Jun Fang, Li Yang, Chengyao Shen, Hamzah Abdel-Aziz, David Thorsley, Joseph Hassoun ·

Convolutional neural networks are widely used in practical application in many diverse environments. Each different environment requires a different optimized network to maximize accuracy under its unique hardware constraints and latency requirements. To find models for this varied array of potential deployment targets, once-for-all (OFA) was introduced as a way to simultaneously co-train many models at once, while keeping the total training cost constant. However, the total training cost is very high, requiring up to 1200 GPU-hours. Compound OFA (compOFA) decreased the training cost of OFA by 2$\times$ by coupling model dimensions to reduce the search space of possible models by orders of magnitude, while also simplifying the training procedure. In this work, we continue the effort to reduce the training cost of OFA methods. While both OFA and compOFA use a pre-trained teacher network, we propose an in-place knowledge distillation procedure to train the super-network simultaneously with the sub-networks. Within this in-place distillation framework, we develop an upper-attentive sample technique that reduces the training cost per epoch while maintaining accuracy. Through experiments on ImageNet, we demonstrate that, we can achieve a $2\times$ - $3\times$ ($1.5\times$ - $1.8\times$) reduction in training time compared to the state of the art OFA and compOFA, respectively, without loss of optimality.

PDF Abstract