Data Free Quantization
17 papers with code • 2 benchmarks • 1 datasets
Data Free Quantization is a technique to achieve a highly accurate quantized model without accessing any training data.
Source: Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples
Libraries
Use these libraries to find Data Free Quantization models and implementationsMost implemented papers
Data-Free Quantization Through Weight Equalization and Bias Correction
This improves quantization accuracy performance, and can be applied to many common computer vision architectures with a straight forward API call.
ZeroQ: A Novel Zero Shot Quantization Framework
Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0. 5\% of one epoch training time of ResNet50 on ImageNet).
Generative Low-bitwidth Data Free Quantization
More critically, our method achieves much higher accuracy on 4-bit quantization than the existing data free quantization method.
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models
Several post-training quantization methods have been applied to large language models (LLMs), and have been shown to perform well down to 8-bits.
Qimera: Data-free Quantization with Synthetic Boundary Supporting Samples
We find that this is often insufficient to capture the distribution of the original data, especially around the decision boundaries.
Zero-shot Adversarial Quantization
To address the above issues, we propose a zero-shot adversarial quantization (ZAQ) framework, facilitating effective discrepancy estimation and knowledge transfer from a full-precision model to its quantized model.
Diverse Sample Generation: Pushing the Limit of Generative Data-free Quantization
We first give a theoretical analysis that the diversity of synthetic samples is crucial for the data-free quantization, while in existing approaches, the synthetic data completely constrained by BN statistics experimentally exhibit severe homogenization at distribution and sample levels.
SQuant: On-the-Fly Data-Free Quantization via Diagonal Hessian Approximation
This paper proposes an on-the-fly DFQ framework with sub-second quantization time, called SQuant, which can quantize networks on inference-only devices with low computation and memory requirements.
Patch Similarity Aware Data-Free Quantization for Vision Transformers
The above insights guide us to design a relative value metric to optimize the Gaussian noise to approximate the real images, which are then utilized to calibrate the quantization parameters.
It's All In the Teacher: Zero-Shot Quantization Brought Closer to the Teacher
To deal with the performance drop induced by quantization errors, a popular method is to use training data to fine-tune quantized networks.