Quantization

449 papers with code • 2 benchmarks • 11 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Greatest papers with code

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

tensorflow/models CVPR 2018

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes.

General Classification Quantization

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

huggingface/pytorch-transformers 20 Apr 2020

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions.

Quantization

Unsupervised Cross-lingual Representation Learning for Speech Recognition

huggingface/transformers 24 Jun 2020

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

Quantization Representation Learning +1

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

huggingface/transformers NeurIPS 2020

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

 Ranked #1 on Speech Recognition on TIMIT (using extra training data)

Fine-tuning Quantization +2

I-BERT: Integer-only BERT Quantization

huggingface/transformers 5 Jan 2021

Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks.

Natural Language Inference Natural Language Understanding +1

FastText.zip: Compressing text classification models

facebookresearch/fastText 12 Dec 2016

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.

Classification General Classification +3

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

google-research/google-research 7 May 2021

In this work, we use ResNet as a case study to systematically investigate the effects of quantization on inference compute cost-quality tradeoff curves.

Quantization

ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution

google-research/google-research 19 Jan 2021

We consider the problem of efficient blackbox optimization over a large hybrid search space, consisting of a mixture of a high dimensional continuous space and a complex combinatorial space.

Combinatorial Optimization Continuous Control +3

What Do Compressed Deep Neural Networks Forget?

google-research/google-research 13 Nov 2019

However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques.

Fairness Interpretability Techniques for Deep Learning +4

Link and code: Fast indexing with graphs and compact regression codes

facebookresearch/faiss CVPR 2018

Similarity search approaches based on graph walks have recently attained outstanding speed-accuracy trade-offs, taking aside the memory requirements.

Image Similarity Search Quantization