Model Compression
343 papers with code • 2 benchmarks • 4 datasets
Model Compression is an actively pursued area of research over the last few years with the goal of deploying state-of-the-art deep networks in low-power and resource limited devices without significant drop in accuracy. Parameter pruning, low-rank factorization and weight quantization are some of the proposed methods to compress the size of deep networks.
Libraries
Use these libraries to find Model Compression models and implementationsLatest papers
Model Compression Techniques in Biometrics Applications: A Survey
The development of deep learning algorithms has extensively empowered humanity's task automatization capacity.
Dynamic DNNs and Runtime Management for Efficient Inference on Mobile/Embedded Devices
In this thesis, we proposed a combined method, a system was developed for DNN performance trade-off management, combining the runtime trade-off opportunities in both algorithms and hardware to meet dynamically changing application performance targets and hardware constraints in real time.
Knowledge Translation: A New Pathway for Model Compression
Deep learning has witnessed significant advancements in recent years at the cost of increasing training, inference, and model storage overhead.
Safety and Performance, Why Not Both? Bi-Objective Optimized Model Compression against Heterogeneous Attacks Toward AI Software Deployment
To mitigate this issue, AI software compression plays a crucial role, which aims to compress model size while keeping high performance.
Generative Model-based Feature Knowledge Distillation for Action Recognition
Addressing this gap, our paper introduces an innovative knowledge distillation framework, with the generative model for training a lightweight student model.
Rethinking Compression: Reduced Order Modelling of Latent Features in Large Language Models
Due to the substantial scale of Large Language Models (LLMs), the direct application of conventional compression methodologies proves impractical.
Understanding the Effect of Model Compression on Social Bias in Large Language Models
Large Language Models (LLMs) trained with self-supervision on vast corpora of web text fit to the social biases of that text.
Language Model Knowledge Distillation for Efficient Question Answering in Spanish
Recent advances in the development of pre-trained Spanish language models has led to significant progress in many Natural Language Processing (NLP) tasks, such as question answering.
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The rapid growth of Large Language Models (LLMs) has been a driving force in transforming various domains, reshaping the artificial general intelligence landscape.
Physics Inspired Criterion for Pruning-Quantization Joint Learning
Pruning-quantization joint learning always facilitates the deployment of deep neural networks (DNNs) on resource-constrained edge devices.