Search Results for author: Jonathan Hseu

Found 2 papers, 2 papers with code

Large Batch Optimization for Deep Learning: Training BERT in 76 minutes

24 code implementations • ICLR 2020 • Yang You, Jing Li, Sashank Reddi, Jonathan Hseu, Sanjiv Kumar, Srinadh Bhojanapalli, Xiaodan Song, James Demmel, Kurt Keutzer, Cho-Jui Hsieh

In this paper, we first study a principled layerwise adaptation strategy to accelerate training of deep neural networks using large mini-batches.

Ranked #11 on Question Answering on SQuAD1.1 dev (F1 metric)

Question Answering Stochastic Optimization

1,847

Paper
Code

Large-Batch Training for LSTM and Beyond

1 code implementation • 24 Jan 2019 • Yang You, Jonathan Hseu, Chris Ying, James Demmel, Kurt Keutzer, Cho-Jui Hsieh

LEGW enables Sqrt Scaling scheme to be useful in practice and as a result we achieve much better results than the Linear Scaling learning rate scheme.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.