TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	SqueezeBERT	Accuracy	46.5%	# 39
Semantic Textual Similarity	MRPC	SqueezeBERT	Accuracy	87.8%	# 24
Natural Language Inference	MultiNLI	SqueezeBERT	Matched	82.0	# 40
Natural Language Inference	MultiNLI	SqueezeBERT	Mismatched	81.1	# 30
Natural Language Inference	QNLI	SqueezeBERT	Accuracy	90.1%	# 37
Question Answering	Quora Question Pairs	SqueezeBERT	Accuracy	80.3%	# 18
Natural Language Inference	RTE	SqueezeBERT	Accuracy	73.2%	# 48
Sentiment Analysis	SST-2 Binary classification	SqueezeBERT	Accuracy	91.4	# 52
Natural Language Inference	WNLI	SqueezeBERT	Accuracy	65.1	# 20

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/squeezebert-what-can-computer-vision-teach/question-answering-on-quora-question-pairs)](https://paperswithcode.com/sota/question-answering-on-quora-question-pairs?p=squeezebert-what-can-computer-vision-teach)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/squeezebert-what-can-computer-vision-teach/natural-language-inference-on-wnli)](https://paperswithcode.com/sota/natural-language-inference-on-wnli?p=squeezebert-what-can-computer-vision-teach)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/squeezebert-what-can-computer-vision-teach/semantic-textual-similarity-on-mrpc)](https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc?p=squeezebert-what-can-computer-vision-teach)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/squeezebert-what-can-computer-vision-teach/natural-language-inference-on-qnli)](https://paperswithcode.com/sota/natural-language-inference-on-qnli?p=squeezebert-what-can-computer-vision-teach)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/squeezebert-what-can-computer-vision-teach/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=squeezebert-what-can-computer-vision-teach)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/squeezebert-what-can-computer-vision-teach/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=squeezebert-what-can-computer-vision-teach)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/squeezebert-what-can-computer-vision-teach/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=squeezebert-what-can-computer-vision-teach)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/squeezebert-what-can-computer-vision-teach/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=squeezebert-what-can-computer-vision-teach)`

SqueezeBERT: What can computer vision teach NLP about efficient neural networks?

EMNLP (sustainlp) 2020 · Forrest N. Iandola, Albert E. Shaw, Ravi Krishna, Kurt W. Keutzer ·

Humans read and write hundreds of billions of messages every day. Further, due to the availability of large datasets, large computing systems, and better neural network models, natural language processing (NLP) technology has made significant strides in understanding, proofreading, and organizing these messages. Thus, there is a significant opportunity to deploy NLP in myriad applications to help web users, social networks, and businesses. In particular, we consider smartphones and other mobile devices as crucial platforms for deploying NLP models at scale. However, today's highly-accurate NLP neural network models such as BERT and RoBERTa are extremely computationally expensive, with BERT-base taking 1.7 seconds to classify a text snippet on a Pixel 3 smartphone. In this work, we observe that methods such as grouped convolutions have yielded significant speedups for computer vision networks, but many of these techniques have not been adopted by NLP neural network designers. We demonstrate how to replace several operations in self-attention layers with grouped convolutions, and we use this technique in a novel network architecture called SqueezeBERT, which runs 4.3x faster than BERT-base on the Pixel 3 while achieving competitive accuracy on the GLUE test set. The SqueezeBERT code will be released.

PDF Abstract EMNLP (sustainlp) 2020 PDF EMNLP (sustainlp) 2020 Abstract

Code

Add Remove Mark official

huggingface/transformers official

125,019

huggingface/transformers

36,726

mindspore-courses/d2l-mindspore

renmada/squeezebert-paddle

Tasks

Add Remove

Linguistic Acceptability

Natural Language Inference

Question Answering

Semantic Textual Similarity

Sentiment Analysis

Text Classification

Transfer Learning

Datasets

GLUE

SST

MultiNLI SST-2

QNLI

MRPC

CoLA

Quora Question Pairs RTE WNLI

Results from the Paper

Edit

Ranked #18 on Question Answering on Quora Question Pairs

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Linguistic Acceptability	CoLA	SqueezeBERT	Accuracy	46.5%	# 39	Compare
Semantic Textual Similarity	MRPC	SqueezeBERT	Accuracy	87.8%	# 24	Compare
Natural Language Inference	MultiNLI	SqueezeBERT	Matched	82.0	# 40	Compare
Natural Language Inference	MultiNLI	SqueezeBERT	Mismatched	81.1	# 30	Compare
Natural Language Inference	QNLI	SqueezeBERT	Accuracy	90.1%	# 37	Compare
Question Answering	Quora Question Pairs	SqueezeBERT	Accuracy	80.3%	# 18	Compare
Natural Language Inference	RTE	SqueezeBERT	Accuracy	73.2%	# 48	Compare
Sentiment Analysis	SST-2 Binary classification	SqueezeBERT	Accuracy	91.4	# 52	Compare
Natural Language Inference	WNLI	SqueezeBERT	Accuracy	65.1	# 20	Compare

Methods

Add Remove

1x1 Convolution • Adam • Attention Dropout • Convolution • Dense Connections • Dropout • GELU • Grouped Convolution • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • SqueezeBERT • Weight Decay • WordPiece

Edit Social Preview

SqueezeBERT: What can computer vision teach NLP about efficient neural networks?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove