Model Compression

342 papers with code • 2 benchmarks • 4 datasets

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Most implemented papers

Training with Quantization Noise for Extreme Model Compression

pytorch/fairseq ICLR 2021

A standard solution is to train networks with Quantization Aware Training, where the weights are quantized during training and the gradients approximated with the Straight-Through Estimator.

LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search

microsoft/NeuralSpeech 8 Feb 2021

Text to speech (TTS) has been broadly used to synthesize natural and intelligible speech in different scenarios.

Actor-Mimic: Deep Multitask and Transfer Reinforcement Learning

eparisotto/ActorMimic 19 Nov 2015

The ability to act in multiple environments and transfer previous knowledge to new situations can be considered a critical aspect of any intelligent agent.

MicroExpNet: An Extremely Small and Fast Model For Expression Recognition From Face Images

cuguilke/microexpnet 19 Nov 2017

On the other hand, KD is proved to be useful for model compression for the FER problem, and we discovered that its effects gets more and more significant with the decreasing model size.

Patient Knowledge Distillation for BERT Model Compression

intersun/PKD-for-BERT-Model-Compression IJCNLP 2019

Pre-trained language models such as BERT have proven to be highly effective for natural language processing (NLP) tasks.

Contrastive Representation Distillation

HobbitLong/RepDistiller ICLR 2020

We demonstrate that this objective ignores important structural knowledge of the teacher network.

Data-Free Adversarial Distillation

VainF/Data-Free-Adversarial-Distillation 23 Dec 2019

Knowledge Distillation (KD) has made remarkable progress in the last few years and become a popular paradigm for model compression and knowledge transfer.

ZeroQ: A Novel Zero Shot Quantization Framework

amirgholami/ZeroQ CVPR 2020

Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0. 5\% of one epoch training time of ResNet50 on ImageNet).

Sharpness-aware Quantization for Deep Neural Networks

ziplab/saq 24 Nov 2021

However, the abrupt changes in quantized weights during training often lead to severe loss fluctuations and result in a sharp loss landscape, making the gradients unstable and thus degrading the performance.

DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale

microsoft/DeepSpeed 14 Jan 2022

As the training of giant dense models hits the boundary on the availability and capability of the hardware resources today, Mixture-of-Experts (MoE) models become one of the most promising model architectures due to their significant training cost reduction compared to a quality-equivalent dense model.