TESLA: Task-wise Early Stopping and Loss Aggregation for Dynamic Neural Network Inference

ICLR 2018 · Chun-Min Chang, Chia-Ching Lin, Hung-Yi Ou Yang, Chin-Laung Lei, Kuan-Ta Chen ·

For inference operations in deep neural networks on end devices, it is desirable to deploy a single pre-trained neural network model, which can dynamically scale across a computation range without comprising accuracy. To achieve this goal, Incomplete Dot Product (IDP) has been proposed to use only a subset of terms in dot products during forward propagation. However, there are some limitations, including noticeable performance degradation in operating regions with low computational costs, and essential performance limitations since IDP uses hand-crafted profile coefficients. In this paper, we extend IDP by proposing new training algorithms involving a single profile, which may be trainable or pre-determined, to significantly improve the overall performance, especially in operating regions with low computational costs. Specifically, we propose the Task-wise Early Stopping and Loss Aggregation (TESLA) algorithm, which is showed in our 3-layer multilayer perceptron on MNIST that outperforms the original IDP by 32\% when only 10\% of dot products terms are used and achieves 94.7\% accuracy on average. By introducing trainable profile coefficients, TESLA further improves the accuracy to 95.5\% without specifying coefficients in advance. Besides, TESLA is applied to the VGG-16 model, which achieves 80\% accuracy using only 20\% of dot product terms on CIFAR-10 and also keeps 60\% accuracy using only 30\% of dot product terms on CIFAR-100, but the original IDP performs like a random guess in these two datasets at such low computation costs. Finally, we visualize the learned representations at different dot product percentages by class activation map and show that, by applying TESLA, the learned representations can adapt over a wide range of operation regions.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

Early Stopping • VGG-16

Edit Social Preview

TESLA: Task-wise Early Stopping and Loss Aggregation for Dynamic Neural Network Inference

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove