TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	Q-BERT (Shen et al., 2020)	Accuracy	65.1	# 23
Semantic Textual Similarity	MRPC	Q-BERT (Shen et al., 2020)	Accuracy	88.2	# 21
Natural Language Inference	MultiNLI	Q-BERT (Shen et al., 2020)	Matched	87.8	# 18
Natural Language Inference	QNLI	Q-BERT (Shen et al., 2020)	Accuracy	93.0	# 22
Natural Language Inference	RTE	Q-BERT (Shen et al., 2020)	Accuracy	84.7	# 28
Sentiment Analysis	SST-2 Binary classification	Q-BERT (Shen et al., 2020)	Accuracy	94.8	# 28
Semantic Textual Similarity	STS Benchmark	Q-BERT (Shen et al., 2020)	Pearson Correlation	0.911	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q-bert-hessian-based-ultra-low-precision/semantic-textual-similarity-on-sts-benchmark)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark?p=q-bert-hessian-based-ultra-low-precision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q-bert-hessian-based-ultra-low-precision/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=q-bert-hessian-based-ultra-low-precision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q-bert-hessian-based-ultra-low-precision/semantic-textual-similarity-on-mrpc)](https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc?p=q-bert-hessian-based-ultra-low-precision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q-bert-hessian-based-ultra-low-precision/natural-language-inference-on-qnli)](https://paperswithcode.com/sota/natural-language-inference-on-qnli?p=q-bert-hessian-based-ultra-low-precision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q-bert-hessian-based-ultra-low-precision/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=q-bert-hessian-based-ultra-low-precision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q-bert-hessian-based-ultra-low-precision/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=q-bert-hessian-based-ultra-low-precision)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/q-bert-hessian-based-ultra-low-precision/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=q-bert-hessian-based-ultra-low-precision)`

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

12 Sep 2019 · Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer ·

Transformer based architectures have become de-facto models used for a range of Natural Language Processing tasks. In particular, the BERT based models achieved significant accuracy gain for GLUE tasks, CoNLL-03 and SQuAD. However, BERT based models have a prohibitive memory footprint and latency. As a result, deploying BERT based models in resource constrained environments has become a challenging task. In this work, we perform an extensive analysis of fine-tuned BERT models using second order Hessian information, and we use our results to propose a novel method for quantizing BERT models to ultra low precision. In particular, we propose a new group-wise quantization scheme, and we use a Hessian based mix-precision method to compress the model further. We extensively test our proposed method on BERT downstream tasks of SST-2, MNLI, CoNLL-03, and SQuAD. We can achieve comparable performance to baseline with at most $2.3\%$ performance degradation, even with ultra-low precision quantization down to 2 bits, corresponding up to $13\times$ compression of the model parameters, and up to $4\times$ compression of the embedding table as well as activations. Among all tasks, we observed the highest performance loss for BERT fine-tuned on SQuAD. By probing into the Hessian based analysis as well as visualization, we show that this is related to the fact that current training/fine-tuning strategy of BERT does not converge for SQuAD.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Linguistic Acceptability

Natural Language Inference

Quantization

Semantic Textual Similarity

Sentiment Analysis

SST-2

Datasets

GLUE

SST

SQuAD

MultiNLI SST-2

QNLI

MRPC CoNLL 2003

CoLA RTE STS Benchmark

Results from the Paper

Edit

Ranked #13 on Semantic Textual Similarity on STS Benchmark

Get a GitHub badge

Results from Other Papers

Task	Dataset	Model	Metric Name	Metric Value	Rank	Compare
Linguistic Acceptability	CoLA	Q-BERT (Shen et al., 2020)	Accuracy	65.1	# 23	See all
Semantic Textual Similarity	MRPC	Q-BERT (Shen et al., 2020)	Accuracy	88.2	# 21	See all
Natural Language Inference	MultiNLI	Q-BERT (Shen et al., 2020)	Matched	87.8	# 18	See all
Natural Language Inference	QNLI	Q-BERT (Shen et al., 2020)	Accuracy	93.0	# 22	See all
Natural Language Inference	RTE	Q-BERT (Shen et al., 2020)	Accuracy	84.7	# 28	See all
Sentiment Analysis	SST-2 Binary classification	Q-BERT (Shen et al., 2020)	Accuracy	94.8	# 28	See all
Semantic Textual Similarity	STS Benchmark	Q-BERT (Shen et al., 2020)	Pearson Correlation	0.911	# 13	See all

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit