no code implementations • NeurIPS 2021 • Zachary Nado, Justin M. Gilmer, Christopher J. Shallue, Rohan Anil, George E. Dahl
Recently the LARS and LAMB optimizers have been proposed for training neural networks faster using large batch sizes.
Ranked #1 on Question Answering on SQuAD1.1 (Hardware Burden metric)
1 code implementation • 30 Oct 2020 • Zoe L. de Beurs, Andrew Vanderburg, Christopher J. Shallue, Xavier Dumusque, Andrew Collier Cameron, Christopher Leet, Lars A. Buchhave, Rosario Cosentino, Adriano Ghedina, Raphaëlle D. Haywood, Nicholas Langellier, David W. Latham, Mercedes López-Morales, Michel Mayor, Giusi Micela, Timothy W. Milbourne, Annelies Mortier, Emilio Molinari, Francesco Pepe, David F. Phillips, Matteo Pinamonti, Giampaolo Piotto, Ken Rice, Dimitar Sasselov, Alessandro Sozzetti, Stéphane Udry, Christopher A. Watson
We trained our machine learning models on both simulated data (generated with the SOAP 2. 0 software; Dumusque et al. 2014) and observations of the Sun from the HARPS-N Solar Telescope (Dumusque et al. 2015; Phillips et al. 2016; Collier Cameron et al. 2019).
1 code implementation • 11 Oct 2019 • Dami Choi, Christopher J. Shallue, Zachary Nado, Jaehoon Lee, Chris J. Maddison, George E. Dahl
In particular, we find that the popular adaptive gradient methods never underperform momentum or gradient descent.
1 code implementation • 12 Jul 2019 • Dami Choi, Alexandre Passos, Christopher J. Shallue, George E. Dahl
In the twilight of Moore's law, GPUs and other specialized hardware accelerators have dramatically sped up neural network training.
1 code implementation • NeurIPS 2019 • Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse
Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns.
2 code implementations • 4 Apr 2019 • Liang Yu, Andrew Vanderburg, Chelsea Huang, Christopher J. Shallue, Ian J. M. Crossfield, B. Scott Gaudi, Tansu Daylan, Anne Dattilo, David J. Armstrong, George R. Ricker, Roland K. Vanderspek, David W. Latham, Sara Seager, Jason Dittmann, John P. Doty, Ana Glidden, Samuel N. Quinn
We apply our model on new data from Sector 6, and present 335 new signals that received the highest scores in triage and vetting and were also identified as planet candidates by human vetters.
Earth and Planetary Astrophysics
no code implementations • 8 Nov 2018 • Christopher J. Shallue, Jaehoon Lee, Joseph Antognini, Jascha Sohl-Dickstein, Roy Frostig, George E. Dahl
Along the way, we show that disagreements in the literature on how batch size affects model quality can largely be explained by differences in metaparameter tuning and compute budgets at different batch sizes.
no code implementations • WS 2018 • Bhuwan Dhingra, Christopher J. Shallue, Mohammad Norouzi, Andrew M. Dai, George E. Dahl
Ideally, we could incorporate our prior knowledge of this hierarchical structure into unsupervised learning algorithms that work on text data.