Model Compression

342 papers with code • 2 benchmarks • 4 datasets

Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.

Source: KD-MRI: A knowledge distillation framework for image reconstruction and image restoration in MRI workflow

Latest papers with no code

Comprehensive Survey of Model Compression and Speed up for Vision Transformers

no code yet • 16 Apr 2024

Vision Transformers (ViT) have marked a paradigm shift in computer vision, outperforming state-of-the-art models across diverse tasks.

Structured Model Pruning for Efficient Inference in Computational Pathology

no code yet • 12 Apr 2024

In this work, we demonstrate that model pruning, as a model compression technique, can effectively reduce inference cost for computational and digital pathology based analysis with a negligible loss of analysis performance.

Bayesian Federated Model Compression for Communication and Computation Efficiency

no code yet • 11 Apr 2024

We propose a decentralized Turbo variational Bayesian inference (D-Turbo-VBI) FL framework where we firstly propose a hierarchical sparse prior to promote a clustered sparse structure in the weight matrix.

Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing

no code yet • 11 Apr 2024

For on-device object detection, researches have been conducted on designing efficient detectors or model compression to reduce inference latency.

On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL

no code yet • 3 Apr 2024

Structured data, prevalent in tables, databases, and knowledge graphs, poses a significant challenge in its representation.

Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution

no code yet • 3 Apr 2024

Knowledge distillation (KD) is a promising yet challenging model compression technique that transfers rich learning representations from a well-performing but cumbersome teacher model to a compact student model.

Automated Inference of Graph Transformation Rules

no code yet • 3 Apr 2024

The explosion of data available in life sciences is fueling an increasing demand for expressive models and computational methods.

Improve Knowledge Distillation via Label Revision and Data Selection

no code yet • 3 Apr 2024

In addition to the supervision of ground truth, the vanilla KD method regards the predictions of the teacher as soft labels to supervise the training of the student model.

Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations

no code yet • 2 Apr 2024

Therefore model compression is important, to retain the performance of larger models, but with a reduced cost of running them.

Instance-Aware Group Quantization for Vision Transformers

no code yet • 1 Apr 2024

In particular, the distribution of activations for each channel vary drastically according to input instances, making PTQ methods for CNNs inappropriate for ViTs.