Quantization

1039 papers with code • 10 benchmarks • 18 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Libraries

Use these libraries to find Quantization models and implementations

Latest papers with no code

Frame Quantization of Neural Networks

no code yet • 11 Apr 2024

We present a post-training quantization algorithm with error estimates relying on ideas originating from frame theory.

Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis

no code yet • 11 Apr 2024

The recent advancement in deep learning (DL) for automatic modulation classification (AMC) of wireless signals has encouraged numerous possible applications on resource-constrained edge devices.

Adapting LLaMA Decoder to Vision Transformer

no code yet • 10 Apr 2024

We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a casual mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.

CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers

no code yet • 10 Apr 2024

The fast-growing large scale language models are delivering unprecedented performance on almost all natural language processing tasks.

Differentiable Search for Finding Optimal Quantization Strategy

no code yet • 10 Apr 2024

To solve the issue, in this paper, we propose a differentiable quantization strategy search (DQSS) to assign optimal quantization strategy for individual layer by taking advantages of the benefits of different quantization algorithms.

Collaborative Edge AI Inference over Cloud-RAN

no code yet • 9 Apr 2024

To realize efficient uplink feature aggregation, we allow each RRH receives local feature vectors from all devices over the same resource blocks simultaneously by leveraging an over-the-air computation (AirComp) technique.

AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval

no code yet • 9 Apr 2024

Despite it claims to save memory usage by loading compressed vectors by product quantization (PQ), its memory usage increases in proportion to the scale of datasets.

Encoder-Quantization-Motion-based Video Quality Metrics

no code yet • 9 Apr 2024

In this work we merge several datasets into one to support the creation of a metric tailored for video compression and scaling.

Investigating the Impact of Quantization on Adversarial Robustness

no code yet • 8 Apr 2024

Quantization is a promising technique for reducing the bit-width of deep models to improve their runtime performance and storage efficiency, and thus becomes a fundamental step for deployment.

Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws

no code yet • 8 Apr 2024

More broadly, we present 12 results on how (1) training duration, (2) model architecture, (3) quantization, (4) sparsity constraints such as MoE, and (5) data signal-to-noise ratio affect a model's knowledge storage capacity.