Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

ICLR 2020 Yang YouJing LiSashank ReddiJonathan HseuSanjiv KumarSrinadh BhojanapalliXiaodan SongJames DemmelKurt KeutzerCho-Jui Hsieh

Training large deep neural networks on massive datasets is computationally very challenging. There has been recent surge in interest in using large batch stochastic optimization methods to tackle this issue... (read more)

PDF Abstract

Evaluation Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK COMPARE
Question Answering SQuAD1.1 dev BERT large (LAMB optimizer) F1 90.584 # 9