TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	PSQ (Chen et al., 2020)	Accuracy	67.5	# 21
Semantic Textual Similarity	MRPC	PSQ (Chen et al., 2020)	Accuracy	90.4	# 13
Natural Language Inference	MultiNLI	PSQ (Chen et al., 2020)	Matched	89.9	# 11
Natural Language Inference	QNLI	PSQ (Chen et al., 2020)	Accuracy	94.5	# 15
Natural Language Inference	RTE	PSQ (Chen et al., 2020)	Accuracy	86.8	# 23
Sentiment Analysis	SST-2 Binary classification	PSQ (Chen et al., 2020)	Accuracy	96.2	# 19
Semantic Textual Similarity	STS Benchmark	PSQ (Chen et al., 2020)	Pearson Correlation	0.919	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-statistical-framework-for-low-bitwidth/semantic-textual-similarity-on-sts-benchmark)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark?p=a-statistical-framework-for-low-bitwidth)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-statistical-framework-for-low-bitwidth/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=a-statistical-framework-for-low-bitwidth)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-statistical-framework-for-low-bitwidth/semantic-textual-similarity-on-mrpc)](https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc?p=a-statistical-framework-for-low-bitwidth)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-statistical-framework-for-low-bitwidth/natural-language-inference-on-qnli)](https://paperswithcode.com/sota/natural-language-inference-on-qnli?p=a-statistical-framework-for-low-bitwidth)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-statistical-framework-for-low-bitwidth/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=a-statistical-framework-for-low-bitwidth)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-statistical-framework-for-low-bitwidth/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=a-statistical-framework-for-low-bitwidth)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/a-statistical-framework-for-low-bitwidth/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=a-statistical-framework-for-low-bitwidth)`

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

NeurIPS 2020 · Jianfei Chen, Yu Gai, Zhewei Yao, Michael W. Mahoney, Joseph E. Gonzalez ·

Fully quantized training (FQT), which uses low-bitwidth hardware by quantizing the activations, weights, and gradients of a neural network model, is a promising approach to accelerate the training of deep neural networks. One major challenge with FQT is the lack of theoretical understanding, in particular of how gradient quantization impacts convergence properties. In this paper, we address this problem by presenting a statistical framework for analyzing FQT algorithms. We view the quantized gradient of FQT as a stochastic estimator of its full precision counterpart, a procedure known as quantization-aware training (QAT). We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance. Inspired by these theoretical results, we develop two novel gradient quantizers, and we show that these have smaller variance than the existing per-tensor quantizer. For training ResNet-50 on ImageNet, our 5-bit block Householder quantizer achieves only 0.5% validation accuracy loss relative to QAT, comparable to the existing INT8 baseline.

PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 Abstract

Code

Add Remove Mark official

cjf00000/StatQuant official

Tasks

Add Remove

Linguistic Acceptability

Natural Language Inference

Quantization

Semantic Textual Similarity

Sentiment Analysis

Datasets

CIFAR-10

ImageNet

GLUE

SST

MultiNLI SST-2

QNLI

MRPC

CoLA RTE STS Benchmark

Results from the Paper

Edit

Ranked #9 on Semantic Textual Similarity on STS Benchmark

Get a GitHub badge

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
Linguistic Acceptability	CoLA	PSQ (Chen et al., 2020)	Accuracy	67.5	# 21	See all
Semantic Textual Similarity	MRPC	PSQ (Chen et al., 2020)	Accuracy	90.4	# 13	See all
Natural Language Inference	MultiNLI	PSQ (Chen et al., 2020)	Matched	89.9	# 11	See all
Natural Language Inference	QNLI	PSQ (Chen et al., 2020)	Accuracy	94.5	# 15	See all
Natural Language Inference	RTE	PSQ (Chen et al., 2020)	Accuracy	86.8	# 23	See all
Sentiment Analysis	SST-2 Binary classification	PSQ (Chen et al., 2020)	Accuracy	96.2	# 19	See all
Semantic Textual Similarity	STS Benchmark	PSQ (Chen et al., 2020)	Pearson Correlation	0.919	# 9	See all

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

A Statistical Framework for Low-bitwidth Training of Deep Neural Networks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit