|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training.
Before computing the gradients for each weight update, targeted dropout stochastically selects a set of units or weights to be dropped using a simple self-reinforcing sparsity criterion and then computes the gradients for the remaining weights.
Structured pruning is a popular method for compressing a neural network: given a large trained network, one alternates between removing channel connections and fine-tuning; reducing the overall width of the network.
2) Cross-layer filter comparison is unachievable since the importance is defined locally within each layer.
In this paper, we propose two novel network quantization approaches, single-level network quantization (SLQ) for high-bit quantization and multi-level network quantization (MLQ) for extremely low-bit quantization (ternary). We are the first to consider the network quantization from both width and depth level.
While deep neural networks are a highly successful model class, their large memory footprint puts considerable strain on energy consumption, communication bandwidth, and storage requirements.