Quantization

1032 papers with code • 10 benchmarks • 18 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Benchmarks

Add a Result

These leaderboards are used to track progress in Quantization

Dataset	Best Model	Compare
ImageNet	FQ-ViT (ViT-L)	See all
CIFAR-10	3DCNN_VIVA_3	See all
Knowledge-based:	3DCNN_VIVA_5	See all
MS COCO	SSD ResNet50 V1 FPN 640x640	See all
LFW		See all
CFP-FP		See all
AgeDB-30		See all
IJB-C		See all
IJB-B		See all
Wiki-40B	OutEffHop-Bert_base	See all

Libraries

Use these libraries to find Quantization models and implementations

microsoft/DeepSpeed

8 papers

32,517

intel/neural-compressor

5 papers

1,940

faceonlive/ai-research

5 papers

131

UCMerced-ML/LC-model-compression

5 papers

See all 6 libraries.

Datasets

Subtasks

Latest papers

Most implemented Social Latest No code

Efficient Multi-Vector Dense Retrieval Using Bit Vectors

cosimorulli/emvb • 3 Apr 2024

This paper proposes ``Efficient Multi-Vector dense retrieval with Bit vectors'' (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval.

03 Apr 2024

Paper
Code

Minimize Quantization Output Error with Bias Compensation

gongcheng1919/bias-compensation • • 2 Apr 2024

Quantization is a promising method that reduces memory usage and computational intensity of Deep Neural Networks (DNNs), but it often leads to significant output error that hinder model deployment.

02 Apr 2024

Paper
Code

Transformer based Pluralistic Image Completion with Reduced Information Loss

liuqk3/put • • 31 Mar 2024

The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer.

147

31 Mar 2024

Paper
Code

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

spcl/quarot • • 30 Mar 2024

We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits.

123

30 Mar 2024

Paper
Code

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

pingchengdong/gqa-lut • • 28 Mar 2024

Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs.

28 Mar 2024

Paper
Code

QNCD: Quantization Noise Correction for Diffusion Models

huanpengchu/qncd • • 28 Mar 2024

Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity.

28 Mar 2024

Paper
Code

The Unreasonable Ineffectiveness of the Deeper Layers

arcee-ai/PruneMe • • 26 Mar 2024

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

26 Mar 2024

Paper
Code

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

yihangchen-ee/hac • • 21 Mar 2024

3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity.

21 Mar 2024

Paper
Code

AffineQuant: Affine Transformation Quantization for Large Language Models

bytedance/affinequant • • 19 Mar 2024

Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training.

19 Mar 2024

Paper
Code

NoisyDECOLLE: Robust Local Learning for SNNs on Neuromorphic Hardware

RWTH-IDS/noisy-decolle • • International Conference on Machine Learning and Applications (ICMLA) 2024

However, mapping these algorithms to neuromorphic systems to unleash their potential can be impaired by various kinds of noise.

19 Mar 2024

Paper
Code

Quantization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result