Quantization

1003 papers with code • 9 benchmarks • 17 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Libraries

Use these libraries to find Quantization models and implementations

HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression

yihangchen-ee/hac 21 Mar 2024

3D Gaussian Splatting (3DGS) has emerged as a promising framework for novel view synthesis, boasting rapid rendering speed with high fidelity.

60
21 Mar 2024

AffineQuant: Affine Transformation Quantization for Large Language Models

bytedance/affinequant 19 Mar 2024

Among these techniques, Post-Training Quantization (PTQ) has emerged as a subject of considerable interest due to its noteworthy compression efficiency and cost-effectiveness in the context of training.

3
19 Mar 2024

Self-Supervised Quantization-Aware Knowledge Distillation

kaiqi123/sqakd 17 Mar 2024

Quantization-aware training (QAT) and Knowledge Distillation (KD) are combined to achieve competitive performance in creating low-bit deep learning models.

2
17 Mar 2024

TaxoLLaMA: WordNet-based Model for Solving Multiple Lexical Sematic Tasks

vityavitalich/taxollama 14 Mar 2024

It achieves 11 SotA results, 4 top-2 results out of 16 tasks for the Taxonomy Enrichment, Hypernym Discovery, Taxonomy Construction, and Lexical Entailment tasks.

3
14 Mar 2024

Adversarial Fine-tuning of Compressed Neural Networks for Joint Improvement of Robustness and Efficiency

saintslab/adver-fine 14 Mar 2024

We present experiments on two benchmark datasets showing that adversarial fine-tuning of compressed models can achieve robustness performance comparable to adversarially trained models, while also improving computational efficiency.

0
14 Mar 2024

Chronos: Learning the Language of Time Series

amazon-science/chronos-forecasting 12 Mar 2024

We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models.

1,256
12 Mar 2024

GEAR: An Efficient KV Cache Compression Recipe for Near-Lossless Generative Inference of LLM

haokang-timmy/gear 8 Mar 2024

Key-value (KV) caching has become the de-facto to accelerate generation speed for large language models (LLMs) inference.

50
08 Mar 2024

Algorithm-Hardware Co-Design of Distribution-Aware Logarithmic-Posit Encodings for Efficient DNN Inference

georgia-tech-synergy-lab/LogarithmicPosit 8 Mar 2024

Traditional Deep Neural Network (DNN) quantization methods using integer, fixed-point, or floating-point data types struggle to capture diverse DNN parameter distributions at low precision, and often require large silicon overhead and intensive quantization-aware training.

0
08 Mar 2024

QAQ: Quality Adaptive Quantization for LLM KV Cache

clubiedong/kvcachequantization 7 Mar 2024

The emergence of LLMs has ignited a fresh surge of breakthroughs in NLP applications, particularly in domains such as question-answering systems and text generation.

18
07 Mar 2024

Behavior Generation with Latent Actions

jayLEE0301/vq_bet_official 5 Mar 2024

Unlike language or image generation, decision making requires modeling actions - continuous-valued vectors that are multimodal in their distribution, potentially drawn from uncurated sources, where generation errors can compound in sequential prediction.

42
05 Mar 2024