Model Compression

401 papers with code • 2 benchmarks • 3 datasets

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Libraries

Use these libraries to find Model Compression models and implementations

Most implemented papers

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size

DeepScale/SqueezeNet 24 Feb 2016

(2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car.

Well-Read Students Learn Better: On the Importance of Pre-training Compact Models

google-research/bert ICLR 2020

Recent developments in natural language representations have been accompanied by large and expensive models that leverage vast amounts of general-domain text through self-supervised pre-training.

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

ist-daslab/gptq 31 Oct 2022

In this paper, we address this challenge, and propose GPTQ, a new one-shot weight quantization method based on approximate second-order information, that is both highly-accurate and highly-efficient.

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

mit-han-lab/amc ECCV 2018

Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets.

Learned Step Size Quantization

zhutmost/lsq-net ICLR 2020

Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases.

The State of Sparsity in Deep Neural Networks

ars-ashuha/variational-dropout-sparsifies-dnn 25 Feb 2019

We rigorously evaluate three state-of-the-art techniques for inducing sparsity in deep neural networks on two large-scale learning tasks: Transformer trained on WMT 2014 English-to-German, and ResNet-50 trained on ImageNet.

Ternary Weight Networks

fengfu-chris/caffe-twns 16 May 2016

We present a memory and computation efficient ternary weight networks (TWNs) - with weights constrained to +1, 0 and -1.

Model compression via distillation and quantization

antspy/quantized_distillation ICLR 2018

Deep neural networks (DNNs) continue to make significant advances, solving tasks from image classification to translation or reinforcement learning.

To prune, or not to prune: exploring the efficacy of pruning for model compression

intellabs/model-compression-research-package ICLR 2018

Model pruning seeks to induce sparsity in a deep neural network's various connection matrices, thereby reducing the number of nonzero-valued parameters in the model.

Global Sparse Momentum SGD for Pruning Very Deep Neural Networks

DingXiaoH/GSM-SGD NeurIPS 2019

Deep Neural Network (DNN) is powerful but computationally expensive and memory intensive, thus impeding its practical usage on resource-constrained front-end devices.