449 papers with code • 2 benchmarks • 11 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Greatest papers with code

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

tensorflow/models CVPR 2018

The rising popularity of intelligent mobile devices and the daunting computational cost of deep learning-based models call for efficient and accurate on-device inference schemes.

General Classification Quantization

Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation

huggingface/pytorch-transformers 20 Apr 2020

Quantization techniques can reduce the size of Deep Neural Networks and improve inference latency and throughput by taking advantage of high throughput integer instructions.


Unsupervised Cross-lingual Representation Learning for Speech Recognition

huggingface/transformers 24 Jun 2020

This paper presents XLSR which learns cross-lingual speech representations by pretraining a single model from the raw waveform of speech in multiple languages.

Quantization Representation Learning +1

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

huggingface/transformers NeurIPS 2020

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler.

 Ranked #1 on Speech Recognition on TIMIT (using extra training data)

Fine-tuning Quantization +2

I-BERT: Integer-only BERT Quantization

huggingface/transformers 5 Jan 2021

Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks.

Natural Language Inference Natural Language Understanding +1

FastText.zip: Compressing text classification models

facebookresearch/fastText 12 Dec 2016

We consider the problem of producing compact architectures for text classification, such that the full model fits in a limited amount of memory.

Classification General Classification +3

Pareto-Optimal Quantized ResNet Is Mostly 4-bit

google-research/google-research 7 May 2021

In this work, we use ResNet as a case study to systematically investigate the effects of quantization on inference compute cost-quality tradeoff curves.


ES-ENAS: Blackbox Optimization over Hybrid Spaces via Combinatorial and Continuous Evolution

google-research/google-research 19 Jan 2021

We consider the problem of efficient blackbox optimization over a large hybrid search space, consisting of a mixture of a high dimensional continuous space and a complex combinatorial space.

Combinatorial Optimization Continuous Control +3

What Do Compressed Deep Neural Networks Forget?

google-research/google-research 13 Nov 2019

However, this measure of performance conceals significant differences in how different classes and images are impacted by model compression techniques.

Fairness Interpretability Techniques for Deep Learning +4

Link and code: Fast indexing with graphs and compact regression codes

facebookresearch/faiss CVPR 2018

Similarity search approaches based on graph walks have recently attained outstanding speed-accuracy trade-offs, taking aside the memory requirements.

Image Similarity Search Quantization