On the Efficiency of Deep Neural Networks

29 Sep 2021 · Yibin Liang, Yang Yi, Lingjia Liu ·

The efficiency of neural networks is very important in large-scale deployment scenarios such as mobile applications, internet of things, and edge computing. For given performance requirement, an efficient neural network should use the simplest network architecture with minimal number of parameters and connections. In this paper, we discuss several key issues and a new procedure for obtaining efficient networks that minimize total number of parameters and computation requirement. Our first contribution is identifying and analyzing several key components in training efficient networks with the backpropagation (BP) algorithm: 1) softmax normalization in output layers may be one major cause of parameter explosion; 2) using log likelihood ratio (LLR) representation in output layers can reduce overfitting; 3) weight decaying and structural regularization can effectively reduce overfitting when ReLU activation is used. The second contribution is discovering that a well-trained network without overfitting can be effectively pruned using a simple snapshot-based procedure -- after pruning unimportant weights and connections, simply adjust remaining non-weight parameters using the BP algorithm. The snapshot-based pruning method could also be used to evaluate and analyze the efficiency of neural networks. Finally, we hypothesize that there exist lower-bounds of total number of bits for representing parameters and connections with regard to performance metrics for a given optimization problem. Rather than focusing on improving the sole accuracy metric with more complex network architectures, we should also explore the trade-offs between accuracy and total number of representation bits when comparing different network architectures and implementations.

PDF Abstract