Quantization

1032 papers with code • 10 benchmarks • 18 datasets

Quantization is a promising technique to reduce the computation cost of neural network training, which can replace high-cost floating-point numbers (e.g., float32) with low-cost fixed-point numbers (e.g., int8/int16).

Source: Adaptive Precision Training: Quantify Back Propagation in Neural Networks with Fixed-point Numbers

Libraries

Use these libraries to find Quantization models and implementations

Latest papers with no code

EdgeFusion: On-Device Text-to-Image Generation

no code yet • 18 Apr 2024

The intensive computational burden of Stable Diffusion (SD) for text-to-image generation poses a significant hurdle for its practical application.

Privacy-Preserving UCB Decision Process Verification via zk-SNARKs

no code yet • 18 Apr 2024

With the increasingly widespread application of machine learning, how to strike a balance between protecting the privacy of data and algorithm parameters and ensuring the verifiability of machine learning has always been a challenge.

LongVQ: Long Sequence Modeling with Vector Quantization on Structured Memory

no code yet • 17 Apr 2024

Transformer models have been successful in various sequence processing tasks, but the self-attention mechanism's computational cost limits its practicality for long sequences.

Neural Network Approach for Non-Markovian Dissipative Dynamics of Many-Body Open Quantum Systems

no code yet • 17 Apr 2024

Simulating the dynamics of open quantum systems coupled to non-Markovian environments remains an outstanding challenge due to exponentially scaling computational costs.

QGen: On the Ability to Generalize in Quantization Aware Training

no code yet • 17 Apr 2024

In this work, we investigate the generalization properties of quantized neural networks, a characteristic that has received little attention despite its implications on model performance.

Comprehensive Survey of Model Compression and Speed up for Vision Transformers

no code yet • 16 Apr 2024

Vision Transformers (ViT) have marked a paradigm shift in computer vision, outperforming state-of-the-art models across diverse tasks.

Tripod: Three Complementary Inductive Biases for Disentangled Representation Learning

no code yet • 16 Apr 2024

Inductive biases are crucial in disentangled representation learning for narrowing down an underspecified solution set.

Efficient and accurate neural field reconstruction using resistive memory

no code yet • 15 Apr 2024

The GE harnesses the intrinsic stochasticity of resistive memory for efficient input encoding, while the PE achieves precise weight mapping through a Hardware-Aware Quantization (HAQ) circuit.

TMPQ-DM: Joint Timestep Reduction and Quantization Precision Selection for Efficient Diffusion Models

no code yet • 15 Apr 2024

Diffusion models have emerged as preeminent contenders in the realm of generative models.

Quantization of Large Language Models with an Overdetermined Basis

no code yet • 15 Apr 2024

In this paper, we introduce an algorithm for data quantization based on the principles of Kashin representation.