We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.
(2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car.
Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning.
Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets.
Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices.
Channel pruning is one of the predominant approaches for deep model compression.
This paper is aimed at creating extremely small and fast convolutional neural networks (CNN) for the problem of facial expression recognition (FER) from frontal face images.
However, while the results are desirable, finding the best compression strategy for a given neural network, target platform, and optimization objective often requires extensive experimentation.