Quantization
1000 papers with code • 9 benchmarks • 17 datasets
Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).
Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers
Libraries
Use these libraries to find Quantization models and implementationsDatasets
Latest papers with no code
Super-High-Fidelity Image Compression via Hierarchical-ROI and Adaptive Quantization
MSE-based models aim to improve objective metrics while generative models are leveraged to improve visual quality measured by subjective metrics.
Hierarchical Frequency-based Upsampling and Refining for Compressed Video Quality Enhancement
The goal of video quality enhancement is to reduce compression artifacts and reconstruct a visually-pleasant result.
Spatio-Temporal Fluid Dynamics Modeling via Physical-Awareness and Parameter Diffusion Guidance
This paper proposes a two-stage framework named ST-PAD for spatio-temporal fluid dynamics modeling in the field of earth sciences, aiming to achieve high-precision simulation and prediction of fluid dynamics through spatio-temporal physics awareness and parameter diffusion guidance.
HyperVQ: MLR-based Vector Quantization in Hyperbolic Space
However, since the VQVAE is trained with a reconstruction objective, there is no constraint for the embeddings to be well disentangled, a crucial aspect for using them in discriminative tasks.
Decoding Compressed Trust: Scrutinizing the Trustworthiness of Efficient LLMs Under Compression
While state-of-the-art (SoTA) compression methods boast impressive advancements in preserving benign task performance, the potential risks of compression in terms of safety and trustworthiness have been largely neglected.
Quantization Effects on Neural Networks Perception: How would quantization change the perceptual field of vision models?
Neural network quantization is an essential technique for deploying models on resource-constrained devices.
Quantization Avoids Saddle Points in Distributed Optimization
More specifically, we propose a stochastic quantization scheme and prove that it can effectively escape saddle points and ensure convergence to a second-order stationary point in distributed nonconvex optimization.
UniCode: Learning a Unified Codebook for Multimodal Large Language Models
In this paper, we propose \textbf{UniCode}, a novel approach within the domain of multimodal large language models (MLLMs) that learns a unified codebook to efficiently tokenize visual, text, and potentially other types of signals.
Generalized Relevance Learning Grassmann Quantization
The proposed model returns a set of prototype subspaces and a relevance vector.
FedComLoc: Communication-Efficient Distributed Training of Sparse and Quantized Models
Federated Learning (FL) has garnered increasing attention due to its unique characteristic of allowing heterogeneous clients to process their private data locally and interact with a central server, while being respectful of privacy.