Model Compression

241 papers with code • 0 benchmarks • 1 datasets

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow


Use these libraries to find Model Compression models and implementations

Most implemented papers

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

DeepScale/SqueezeNet 24 Feb 2016

(2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car.

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

google-research/bert ICLR 2020

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

mit-han-lab/amc ECCV 2018

Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets.

Model compression via distillation and quantization

antspy/quantized_distillation ICLR 2018

Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning.

The State of Sparsity in Deep Neural Networks

ars-ashuha/variational-dropout-sparsifies-dnn 25 Feb 2019

We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

DingXiaoH/GSM-SGD NeurIPS 2019

Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices.

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

microsoft/NeuralSpeech 8 Feb 2021

Text to speech (TTS) has been broadly used to synthesize natural and intelligible speech in different scenarios.

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

eparisotto/ActorMimic 19 Nov 2015

The ability to act in multiple environments and transfer previous knowledge to new situations can be considered a critical aspect of any intelligent agent.

Ternary Weight Networks

fengfu-chris/caffe-twns 16 May 2016

We present a memory and computation efficient ternary weight networks (TWNs) - with weights constrained to +1, 0 and -1.

To prune, or not to prune: exploring the efficacy of pruning for model compression

intellabs/model-compression-research-package ICLR 2018

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model.