Quantization

636 papers with code • 4 benchmarks • 13 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Libraries

Use these libraries to find Quantization models and implementations

Most implemented papers

FastText.zip: Compressing text classification models

facebookresearch/fastText 12 Dec 2016

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

tensorflow/models CVPR 2018

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes.

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

pytorch/fairseq NeurIPS 2020

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

NervanaSystems/distiller 1 Oct 2015

To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients

tensorpack/tensorpack 20 Jun 2016

We propose DoReFa-Net, a method to train convolutional neural networks that have low bitwidth weights and activations using low bitwidth parameter gradients.

Billion-scale similarity search with GPUs

facebookresearch/faiss 28 Feb 2017

Similarity search finds application in specialized database systems handling complex data such as images or videos, which are typically represented by high-dimensional features and require specific indexing structures.

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

mit-han-lab/once-for-all CVPR 2019

Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.

Polysemous codes

facebookresearch/faiss 7 Sep 2016

This paper considers the problem of approximate nearest neighbor search in the compressed domain.

Learned Step Size Quantization

zhutmost/lsq-net ICLR 2020

Deep networks run with low precision operations at inference time offer power and space advantages over high precision alternatives, but need to overcome the challenge of maintaining high accuracy as precision decreases.

Improvements to Target-Based 3D LiDAR to Camera Calibration

UMich-BipedLab/extrinsic_lidar_camera_calibration 7 Oct 2019

The homogeneous transformation between a LiDAR and monocular camera is required for sensor fusion tasks, such as SLAM.