Quantization
1039 papers with code • 10 benchmarks • 18 datasets
Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).
Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers
Libraries
Use these libraries to find Quantization models and implementationsDatasets
Latest papers with no code
Frame Quantization of Neural Networks
We present a post-training quantization algorithm with error estimates relying on ideas originating from frame theory.
Edge-Efficient Deep Learning Models for Automatic Modulation Classification: A Performance Analysis
The recent advancement in deep learning (DL) for automatic modulation classification (AMC) of wireless signals has encouraged numerous possible applications on resource-constrained edge devices.
Adapting LLaMA Decoder to Vision Transformer
We first "LLaMAfy" a standard ViT step-by-step to align with LLaMA's architecture, and find that directly applying a casual mask to the self-attention brings an attention collapse issue, resulting in the failure to the network training.
CQIL: Inference Latency Optimization with Concurrent Computation of Quasi-Independent Layers
The fast-growing large scale language models are delivering unprecedented performance on almost all natural language processing tasks.
Differentiable Search for Finding Optimal Quantization Strategy
To solve the issue, in this paper, we propose a differentiable quantization strategy search (DQSS) to assign optimal quantization strategy for individual layer by taking advantages of the benefits of different quantization algorithms.
Collaborative Edge AI Inference over Cloud-RAN
To realize efficient uplink feature aggregation, we allow each RRH receives local feature vectors from all devices over the same resource blocks simultaneously by leveraging an over-the-air computation (AirComp) technique.
AiSAQ: All-in-Storage ANNS with Product Quantization for DRAM-free Information Retrieval
Despite it claims to save memory usage by loading compressed vectors by product quantization (PQ), its memory usage increases in proportion to the scale of datasets.
Encoder-Quantization-Motion-based Video Quality Metrics
In this work we merge several datasets into one to support the creation of a metric tailored for video compression and scaling.
Investigating the Impact of Quantization on Adversarial Robustness
Quantization is a promising technique for reducing the bit-width of deep models to improve their runtime performance and storage efficiency, and thus becomes a fundamental step for deployment.
Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws
More broadly, we present 12 results on how (1) training duration, (2) model architecture, (3) quantization, (4) sparsity constraints such as MoE, and (5) data signal-to-noise ratio affect a model's knowledge storage capacity.