Quantization

1032 papers with code • 10 benchmarks • 18 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Libraries

Use these libraries to find Quantization models and implementations

Efficient Multi-Vector Dense Retrieval Using Bit Vectors

cosimorulli/emvb 3 Apr 2024

This paper proposes ``Efficient Multi-Vector dense retrieval with Bit vectors'' (EMVB), a novel framework for efficient query processing in multi-vector dense retrieval.

48
03 Apr 2024

Minimize Quantization Output Error with Bias Compensation

gongcheng1919/bias-compensation 2 Apr 2024

Quantization is a promising method that reduces memory usage and computational intensity of Deep Neural Networks (DNNs), but it often leads to significant output error that hinder model deployment.

2
02 Apr 2024

Transformer based Pluralistic Image Completion with Reduced Information Loss

liuqk3/put 31 Mar 2024

The indices of quantized pixels are used as tokens for the inputs and prediction targets of the transformer.

147
31 Mar 2024

QuaRot: Outlier-Free 4-Bit Inference in Rotated LLMs

spcl/quarot 30 Mar 2024

We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits.

123
30 Mar 2024

Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers

pingchengdong/gqa-lut 28 Mar 2024

Non-linear functions are prevalent in Transformers and their lightweight variants, incurring substantial and frequently underestimated hardware costs.

6
28 Mar 2024

QNCD: Quantization Noise Correction for Diffusion Models

huanpengchu/qncd 28 Mar 2024

Diffusion models have revolutionized image synthesis, setting new benchmarks in quality and creativity.

5
28 Mar 2024

The Unreasonable Ineffectiveness of the Deeper Layers

arcee-ai/PruneMe 26 Mar 2024

We empirically study a simple layer-pruning strategy for popular families of open-weight pretrained LLMs, finding minimal degradation of performance on different question-answering benchmarks until after a large fraction (up to half) of the layers are removed.

5
26 Mar 2024

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

yihangchen-ee/hac 21 Mar 2024

3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity.

87
21 Mar 2024

AffineQuant: Affine Transformation Quantization for Large Language Models

bytedance/affinequant 19 Mar 2024

Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training.

6
19 Mar 2024

NoisyDECOLLE: Robust Local Learning for SNNs on Neuromorphic Hardware

RWTH-IDS/noisy-decolle International Conference on Machine Learning and Applications (ICMLA) 2024

However, mapping these algorithms to neuromorphic systems to unleash their potential can be impaired by various kinds of noise.

2
19 Mar 2024