Neural Network Compression
74 papers with code • 1 benchmarks • 1 datasets
Libraries
Use these libraries to find Neural Network Compression models and implementationsLatest papers with no code
Neural Network Compression using Binarization and Few Full-Precision Weights
Quantization and pruning are two effective Deep Neural Networks model compression methods.
End-to-End Neural Network Compression via $\frac{\ell_1}{\ell_2}$ Regularized Latency Surrogates
Our algorithm is versatile and can be used with many popular compression methods including pruning, low-rank factorization, and quantization.
Understanding the Effect of the Long Tail on Neural Network Compression
E. g., it has been shown that mismatches between the full and compressed models can be biased towards under-represented classes.
Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference
Modular Transformers train modularized layers that have the same function of two or more consecutive layers in the original model via module replacing and knowledge distillation.
Evaluation Metrics for DNNs Compression
There is a lot of ongoing research effort into developing different techniques for neural networks compression.
How Informative is the Approximation Error from Tensor Decomposition for Neural Network Compression?
While scaling the approximation error commonly is used to account for the different sizes of layers, the average correlation across layers is smaller than across all choices (i. e. layers, decompositions, and level of compression) before fine-tuning.
Guaranteed Quantization Error Computation for Neural Network Model Compression
Neural network model compression techniques can address the computation issue of deep neural networks on embedded devices in industrial systems.
AcceRL: Policy Acceleration Framework for Deep Reinforcement Learning
Inspired by the redundancy of neural networks, we propose a lightweight parallel training framework based on neural network compression, AcceRL, to accelerate the policy learning while ensuring policy quality.
Partial Binarization of Neural Networks for Budget-Aware Efficient Learning
To address this issue, partial binarization techniques have been developed, but a systematic approach to mixing binary and full-precision parameters in a single network is still lacking.
Neural Network Compression by Joint Sparsity Promotion and Redundancy Reduction
Compression of convolutional neural network models has recently been dominated by pruning approaches.