Neural Network Compression
74 papers with code • 1 benchmarks • 1 datasets
Libraries
Use these libraries to find Neural Network Compression models and implementationsLatest papers
Towards Meta-Pruning via Optimal Transport
Structural pruning of neural networks conventionally relies on identifying and discarding less important neurons, a practice often resulting in significant accuracy loss that necessitates subsequent fine-tuning efforts.
Causal-DFQ: Causality Guided Data-free Network Quantization
Inspired by the causal understanding, we propose the Causality-guided Data-free Network Quantization method, Causal-DFQ, to eliminate the reliance on data via approaching an equilibrium of causality-driven intervened distributions.
A Survey on Deep Neural Network Pruning-Taxonomy, Comparison, Analysis, and Recommendations
Modern deep neural networks, particularly recent large language models, come with massive model sizes that require significant computational and storage resources.
Lightweight Attribute Localizing Models for Pedestrian Attribute Recognition
Pedestrian Attribute Recognition (PAR) deals with the problem of identifying features in a pedestrian image.
Implicit Compressibility of Overparametrized Neural Networks Trained with Heavy-Tailed SGD
Neural network compression has been an increasingly important subject, not only due to its practical relevance, but also due to its theoretical implications, as there is an explicit connection between compressibility and generalization error.
Variation Spaces for Multi-Output Neural Networks: Insights on Multi-Task Learning and Network Compression
This representer theorem establishes that shallow vector-valued neural networks are the solutions to data-fitting problems over these infinite-dimensional spaces, where the network widths are bounded by the square of the number of training data.
SwiftTron: An Efficient Hardware Accelerator for Quantized Transformers
In particular, fixed-point quantization is desirable to ease the computations using lightweight blocks, like adders and multipliers, of the underlying hardware.
WHC: Weighted Hybrid Criterion for Filter Pruning on Convolutional Neural Networks
Filter pruning has attracted increasing attention in recent years for its capacity in compressing and accelerating convolutional neural networks.
DepGraph: Towards Any Structural Pruning
Structural pruning enables model acceleration by removing structurally-grouped parameters from neural networks.
Magnitude and Similarity based Variable Rate Filter Pruning for Efficient Convolution Neural Networks
We studied several filter selection criteria based on filter magnitude and similarity among filters within a convolution layer, and based on the assumption that the sensitivity of each layer throughout the network is different, unlike conventional fixed rate pruning methods, our algorithm using loss-aware filter selection criteria automatically finds the suitable pruning rate for each layer throughout the network.