2) Cross-layer filter comparison is unachievable since the importance is defined locally within each layer.
Dropout has played an essential role in many successful deep neural networks, by inducing regularization in the model training.
We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.
We show that model compression can improve the population risk of a pre-trained model, by studying the tradeoff between the decrease in the generalization error and the increase in the empirical risk with model compression.
The relationship between the input feature maps and 2D kernels is revealed in a theoretical framework, based on which a kernel sparsity and entropy (KSE) indicator is proposed to quantitate the feature map importance in a feature-agnostic manner to guide model compression.
Our GAN-assisted TSC (GAN-TSC) significantly improves student accuracy for expensive models such as large random forests and deep neural networks on both tabular and image datasets.
Overall, our robust, cross-device implementation for keyword spotting realizes a new paradigm for serving neural network applications, and one of our slim models reduces latency by 66% with a minimal decrease in accuracy of 4% from 94% to 90%.
Channel pruning is one of the predominant approaches for deep model compression.
Making deep convolutional neural networks more accurate typically comes at the cost of increased computational and memory resources.
Recent developments in deep learning with application to language modeling have led to success in tasks of text processing, summarizing and machine translation.