Model Compression
343 papers with code • 2 benchmarks • 4 datasets
Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.
Libraries
Use these libraries to find Model Compression models and implementationsLatest papers
Torch2Chip: An End-to-end Customizable Deep Neural Network Compression and Deployment Toolkit for Prototype Hardware Accelerator Design
The limited degree of freedom in the current toolkit and the under-explored customization hinder the prototype ASIC or FPGA-based accelerator design.
Data-free Knowledge Distillation for Fine-grained Visual Categorization
Our approach utilizes an adversarial distillation framework with attention generator, mixed high-order attention distillation, and semantic feature contrast learning.
Transferable and Principled Efficiency for Open-Vocabulary Segmentation
In the context of efficient OVS, we target achieving performance that is comparable to or even better than prior OVS works based on large vision-language foundation models, by utilizing smaller models that incur lower training costs.
Multilingual Brain Surgeon: Large Language Models Can be Compressed Leaving No Language Behind
MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets.
Are Compressed Language Models Less Subgroup Robust?
To reduce the inference cost of large language models, model compression is increasingly used to create smaller scalable models.
Tiny Models are the Computational Saver for Large Models
By searching and employing the most appropriate tiny model as the computational saver for a given large model, the proposed approaches work as a novel and generic method to model compression.
Adversarial Fine-tuning of Compressed Neural Networks for Joint Improvement of Robustness and Efficiency
We present experiments on two benchmark datasets showing that adversarial fine-tuning of compressed models can achieve robustness performance comparable to adversarially trained models, while also improving computational efficiency.
SVD-LLM: Truncation-aware Singular Value Decomposition for Large Language Model Compression
The advancements in Large Language Models (LLMs) have been hindered by their substantial sizes, which necessitate LLM compression methods for practical deployment.
Bit-mask Robust Contrastive Knowledge Distillation for Unsupervised Semantic Hashing
In this paper, we propose an innovative Bit-mask Robust Contrastive knowledge Distillation (BRCD) method, specifically devised for the distillation of semantic hashing models.
DyCE: Dynamic Configurable Exiting for Deep Learning Compression and Scaling
Moreover, most current dynamic compression designs are monolithic and tightly integrated with base models, thereby complicating the adaptation to novel base models.