Neural Network Compression
74 papers with code • 1 benchmarks • 1 datasets
Libraries
Use these libraries to find Neural Network Compression models and implementationsMost implemented papers
A Closer Look at Structured Pruning for Neural Network Compression
Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network.
ECC: Platform-Independent Energy-Constrained Deep Neural Network Compression via a Bilinear Regression Model
The energy estimate model allows us to formulate DNN compression as a constrained optimization that minimizes the DNN loss function over the energy constraint.
Learning Sparse Networks Using Targeted Dropout
Before computing the gradients for each weight update, targeted dropout stochastically selects a set of units or weights to be dropped using a simple self-reinforcing sparsity criterion and then computes the gradients for the remaining weights.
Forward and Backward Information Retention for Accurate Binary Neural Networks
Our empirical study indicates that the quantization brings information loss in both forward and backward propagation, which is the bottleneck of training accurate binary neural networks.
Distilled Split Deep Neural Networks for Edge-Assisted Real-Time Systems
Offloading the execution of complex Deep Neural Networks (DNNs) models to compute-capable devices at the network edge, that is, edge servers, can significantly reduce capture-to-output delay.
Neural Network Compression Framework for fast model inference
In this work we present a new framework for neural networks compression with fine-tuning, which we called Neural Network Compression Framework (NNCF).
The continuous categorical: a novel simplex-valued exponential family
Simplex-valued data appear throughout statistics and machine learning, for example in the context of transfer learning and compression of deep networks.
Teacher-Class Network: A Neural Network Compression Mechanism
To reduce the overwhelming size of Deep Neural Networks (DNN) teacher-student methodology tries to transfer knowledge from a complex teacher network to a simple student network.
Head Network Distillation: Splitting Distilled Deep Neural Networks for Resource-Constrained Edge Computing Systems
In this paper, we propose to modify the structure and training process of DNN models for complex image classification tasks to achieve in-network compression in the early network layers.
Few-Bit Backward: Quantized Gradients of Activation Functions for Memory Footprint Reduction
Every modern neural network model has quite a few pointwise nonlinearities in its architecture, and such operation induces additional memory costs which -- as we show -- can be significantly reduced by quantization of the gradients.