Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change

ACL 2020 Hongfei XuJosef van GenabithDeyi XiongQiuhui Liu

The choice of hyper-parameters affects the performance of neural models. While much previous research (Sutskever et al., 2013; Duchi et al., 2011; Kingma and Ba, 2015) focuses on accelerating convergence and reducing the effects of the learning rate, comparatively few papers concentrate on the effect of batch size... (read more)

PDF Abstract

Code


No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper