Model Compression
342 papers with code • 2 benchmarks • 4 datasets
Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.
Libraries
Use these libraries to find Model Compression models and implementationsLatest papers with no code
Comprehensive Survey of Model Compression and Speed up for Vision Transformers
Vision Transformers (ViT) have marked a paradigm shift in computer vision, outperforming state-of-the-art models across diverse tasks.
Structured Model Pruning for Efficient Inference in Computational Pathology
In this work, we demonstrate that model pruning, as a model compression technique, can effectively reduce inference cost for computational and digital pathology based analysis with a negligible loss of analysis performance.
Bayesian Federated Model Compression for Communication and Computation Efficiency
We propose a decentralized Turbo variational Bayesian inference (D-Turbo-VBI) FL framework where we firstly propose a hierarchical sparse prior to promote a clustered sparse structure in the weight matrix.
Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing
For on-device object detection, researches have been conducted on designing efficient detectors or model compression to reduce inference latency.
On Linearizing Structured Data in Encoder-Decoder Language Models: Insights from Text-to-SQL
Structured data, prevalent in tables, databases, and knowledge graphs, poses a significant challenge in its representation.
Knowledge Distillation with Multi-granularity Mixture of Priors for Image Super-Resolution
Knowledge distillation (KD) is a promising yet challenging model compression technique that transfers rich learning representations from a well-performing but cumbersome teacher model to a compact student model.
Automated Inference of Graph Transformation Rules
The explosion of data available in life sciences is fueling an increasing demand for expressive models and computational methods.
Improve Knowledge Distillation via Label Revision and Data Selection
In addition to the supervision of ground truth, the vanilla KD method regards the predictions of the teacher as soft labels to supervise the training of the student model.
Enhancing Inference Efficiency of Large Language Models: Investigating Optimization Strategies and Architectural Innovations
Therefore model compression is important, to retain the performance of larger models, but with a reduced cost of running them.
Instance-Aware Group Quantization for Vision Transformers
In particular, the distribution of activations for each channel vary drastically according to input instances, making PTQ methods for CNNs inappropriate for ViTs.