TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Linguistic Acceptability	CoLA	24hBERT	Accuracy	57.1	# 33
Semantic Textual Similarity	MRPC	24hBERT	Accuracy	87.5%	# 25
Natural Language Inference	MultiNLI	24hBERT	Matched	84.4	# 30
Natural Language Inference	MultiNLI	24hBERT	Mismatched	83.8	# 21
Natural Language Inference	QNLI	24hBERT	Accuracy	90.6	# 32
Question Answering	Quora Question Pairs	24hBERT	Accuracy	70.7	# 19
Natural Language Inference	RTE	24hBERT	Accuracy	57.7%	# 79
Sentiment Analysis	SST-2 Binary classification	24hBERT	Accuracy	93.0	# 44
Semantic Textual Similarity	STS Benchmark	24hBERT	Pearson Correlation	0.820	# 27

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-train-bert-with-an-academic-budget/question-answering-on-quora-question-pairs)](https://paperswithcode.com/sota/question-answering-on-quora-question-pairs?p=how-to-train-bert-with-an-academic-budget)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-train-bert-with-an-academic-budget/semantic-textual-similarity-on-mrpc)](https://paperswithcode.com/sota/semantic-textual-similarity-on-mrpc?p=how-to-train-bert-with-an-academic-budget)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-train-bert-with-an-academic-budget/semantic-textual-similarity-on-sts-benchmark)](https://paperswithcode.com/sota/semantic-textual-similarity-on-sts-benchmark?p=how-to-train-bert-with-an-academic-budget)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-train-bert-with-an-academic-budget/natural-language-inference-on-multinli)](https://paperswithcode.com/sota/natural-language-inference-on-multinli?p=how-to-train-bert-with-an-academic-budget)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-train-bert-with-an-academic-budget/natural-language-inference-on-qnli)](https://paperswithcode.com/sota/natural-language-inference-on-qnli?p=how-to-train-bert-with-an-academic-budget)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-train-bert-with-an-academic-budget/linguistic-acceptability-on-cola)](https://paperswithcode.com/sota/linguistic-acceptability-on-cola?p=how-to-train-bert-with-an-academic-budget)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-train-bert-with-an-academic-budget/sentiment-analysis-on-sst-2-binary)](https://paperswithcode.com/sota/sentiment-analysis-on-sst-2-binary?p=how-to-train-bert-with-an-academic-budget)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/how-to-train-bert-with-an-academic-budget/natural-language-inference-on-rte)](https://paperswithcode.com/sota/natural-language-inference-on-rte?p=how-to-train-bert-with-an-academic-budget)`

How to Train BERT with an Academic Budget

EMNLP 2021 · Peter Izsak, Moshe Berchansky, Omer Levy ·

While large language models a la BERT are used ubiquitously in NLP, pretraining them is considered a luxury that only a few well-funded industry labs can afford. How can one train such models with a more modest budget? We present a recipe for pretraining a masked language model in 24 hours using a single low-end deep learning server. We demonstrate that through a combination of software optimizations, design choices, and hyperparameter tuning, it is possible to produce models that are competitive with BERT-base on GLUE tasks at a fraction of the original pretraining cost.

PDF Abstract EMNLP 2021 PDF EMNLP 2021 Abstract

Code

Add Remove Mark official

peteriz/academic-budget-bert official

298

IntelLabs/academic-budget-bert official

298

octanove/shiba

yxzwang/normalized-information-payl…

Tasks

Add Remove

Language Modelling

Linguistic Acceptability

Natural Language Inference

Question Answering

Semantic Textual Similarity

Sentiment Analysis

Datasets

GLUE

SST

MultiNLI SST-2

QNLI

MRPC

CoLA

Quora

Quora Question Pairs RTE STS Benchmark

Results from the Paper

Add Remove

Ranked #19 on Question Answering on Quora Question Pairs

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Linguistic Acceptability	CoLA	24hBERT	Accuracy	57.1	# 33	Compare
Semantic Textual Similarity	MRPC	24hBERT	Accuracy	87.5%	# 25	Compare
Natural Language Inference	MultiNLI	24hBERT	Matched	84.4	# 30	Compare
Natural Language Inference	MultiNLI	24hBERT	Mismatched	83.8	# 21	Compare
Natural Language Inference	QNLI	24hBERT	Accuracy	90.6	# 32	Compare
Question Answering	Quora Question Pairs	24hBERT	Accuracy	70.7	# 19	Compare
Natural Language Inference	RTE	24hBERT	Accuracy	57.7%	# 79	Compare
Sentiment Analysis	SST-2 Binary classification	24hBERT	Accuracy	93.0	# 44	Compare
Semantic Textual Similarity	STS Benchmark	24hBERT	Pearson Correlation	0.820	# 27	Compare

Methods

Add Remove

Adam • Attention Dropout • BERT • Dense Connections • Dropout • GELU • Layer Normalization • Linear Layer • Linear Warmup With Linear Decay • Multi-Head Attention • Residual Connection • Scaled Dot-Product Attention • Softmax • Weight Decay • WordPiece

Edit Social Preview

How to Train BERT with an Academic Budget

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove