Quantization

1039 papers with code • 10 benchmarks • 18 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Benchmarks

Add a Result

These leaderboards are used to track progress in Quantization

Dataset	Best Model	Compare
ImageNet	FQ-ViT (ViT-L)	See all
CIFAR-10	3DCNN_VIVA_3	See all
Knowledge-based:	3DCNN_VIVA_5	See all
MS COCO	SSD ResNet50 V1 FPN 640x640	See all
LFW		See all
CFP-FP		See all
AgeDB-30		See all
IJB-C		See all
IJB-B		See all
Wiki-40B	OutEffHop-Bert_base	See all

Libraries

Use these libraries to find Quantization models and implementations

microsoft/DeepSpeed

8 papers

32,717

faceonlive/ai-research

5 papers

159

UCMerced-ML/LC-model-compression

5 papers

huggingface/transformers

4 papers

125,118

See all 5 libraries.

Datasets

Subtasks

Latest papers with no code

Most implemented Social Latest No code

Frame Quantization of Neural Networks

no code yet • 11 Apr 2024

We present a post-training quantization algorithm with error estimates relying on ideas originating from frame theory.

Paper
Add Code

Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis

no code yet • 11 Apr 2024

The recent advancement in deep learning (DL) for automatic modulation classification (AMC) of wireless signals has encouraged numerous possible applications on resource-constrained edge devices.

Paper
Add Code

Adapting LLaMA Decoder to Vision Transformer

no code yet • 10 Apr 2024

We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a casual mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.

Paper
Add Code

CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers

no code yet • 10 Apr 2024

The fast-growing large scale language models are delivering unprecedented performance on almost all natural language processing tasks.

Paper
Add Code

Differentiable Search for Finding Optimal Quantization Strategy

no code yet • 10 Apr 2024

To solve the issue, in this paper, we propose a differentiable quantization strategy search (DQSS) to assign optimal quantization strategy for individual layer by taking advantages of the benefits of different quantization algorithms.

Paper
Add Code

Collaborative Edge AI Inference over Cloud-RAN

no code yet • 9 Apr 2024

To realize efficient uplink feature aggregation, we allow each RRH receives local feature vectors from all devices over the same resource blocks simultaneously by leveraging an over-the-air computation (AirComp) technique.

Paper
Add Code

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

no code yet • 9 Apr 2024

Despite it claims to save memory usage by loading compressed vectors by product quantization (PQ), its memory usage increases in proportion to the scale of datasets.

Paper
Add Code

Encoder-Quantization-Motion-based Video Quality Metrics

no code yet • 9 Apr 2024

In this work we merge several datasets into one to support the creation of a metric tailored for video compression and scaling.

Paper
Add Code

Investigating the Impact of Quantization on Adversarial Robustness

no code yet • 8 Apr 2024

Quantization is a promising technique for reducing the bit-width of deep models to improve their runtime performance and storage efficiency, and thus becomes a fundamental step for deployment.

Paper
Add Code

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

no code yet • 8 Apr 2024

More broadly, we present 12 results on how (1) training duration, (2) model architecture, (3) quantization, (4) sparsity constraints such as MoE, and (5) data signal-to-noise ratio affect a model's knowledge storage capacity.

Paper
Add Code

Quantization

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result