Search Results for author: Zachary Nado

Found 12 papers, 4 papers with code

A Loss Curvature Perspective on Training Instability in Deep Learning

no code implementations8 Oct 2021 Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur, David Cardoze, George Dahl, Zachary Nado, Orhan Firat

In this work, we study the evolution of the loss Hessian across many classification tasks in order to understand the effect the curvature of the loss has on the training dynamics.

A Loss Curvature Perspective on Training Instabilities of Deep Learning Models

no code implementations ICLR 2022 Justin Gilmer, Behrooz Ghorbani, Ankush Garg, Sneha Kudugunta, Behnam Neyshabur, David Cardoze, George Edward Dahl, Zachary Nado, Orhan Firat

In this work, we study the evolution of the loss Hessian across many classification tasks in order to understand the effect the curvature of the loss has on the training dynamics.

Pre-training helps Bayesian optimization too

2 code implementations16 Sep 2021 Zi Wang, George E. Dahl, Kevin Swersky, Chansoo Lee, Zelda Mariet, Zachary Nado, Justin Gilmer, Jasper Snoek, Zoubin Ghahramani

Contrary to a common belief that BO is suited to optimizing black-box functions, it actually requires domain knowledge on characteristics of those functions to deploy BO successfully.

Revisiting One-vs-All Classifiers for Predictive Uncertainty and Out-of-Distribution Detection in Neural Networks

no code implementations10 Jul 2020 Shreyas Padhy, Zachary Nado, Jie Ren, Jeremiah Liu, Jasper Snoek, Balaji Lakshminarayanan

Accurate estimation of predictive uncertainty in modern neural networks is critical to achieve well calibrated predictions and detect out-of-distribution (OOD) inputs.

OOD Detection Out-of-Distribution Detection

Evaluating Prediction-Time Batch Normalization for Robustness under Covariate Shift

no code implementations19 Jun 2020 Zachary Nado, Shreyas Padhy, D. Sculley, Alexander D'Amour, Balaji Lakshminarayanan, Jasper Snoek

Using this one line code change, we achieve state-of-the-art on recent covariate shift benchmarks and an mCE of 60. 28\% on the challenging ImageNet-C dataset; to our knowledge, this is the best result for any model that does not incorporate additional data augmentation or modification of the training pipeline.

Data Augmentation

On Empirical Comparisons of Optimizers for Deep Learning

no code implementations11 Oct 2019 Dami Choi, Christopher J. Shallue, Zachary Nado, Jaehoon Lee, Chris J. Maddison, George E. Dahl

In particular, we find that the popular adaptive gradient methods never underperform momentum or gradient descent.

Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model

1 code implementation NeurIPS 2019 Guodong Zhang, Lala Li, Zachary Nado, James Martens, Sushant Sachdeva, George E. Dahl, Christopher J. Shallue, Roger Grosse

Increasing the batch size is a popular way to speed up neural network training, but beyond some critical batch size, larger batch sizes yield diminishing returns.

Can You Trust Your Model's Uncertainty? Evaluating Predictive Uncertainty Under Dataset Shift

2 code implementations NeurIPS 2019 Yaniv Ovadia, Emily Fertig, Jie Ren, Zachary Nado, D. Sculley, Sebastian Nowozin, Joshua V. Dillon, Balaji Lakshminarayanan, Jasper Snoek

Modern machine learning methods including deep learning have achieved great success in predictive accuracy for supervised learning tasks, but may still fall short in giving useful estimates of their predictive {\em uncertainty}.

Probabilistic Deep Learning

AutoGraph: Imperative-style Coding with Graph-based Performance

no code implementations16 Oct 2018 Dan Moldovan, James M Decker, Fei Wang, Andrew A Johnson, Brian K. Lee, Zachary Nado, D. Sculley, Tiark Rompf, Alexander B. Wiltschko

In machine learning, imperative style libraries like Autograd and PyTorch are easy to write, but suffer from high interpretive overhead and are not easily deployable in production or mobile settings.

Cannot find the paper you are looking for? You can Submit a new open access paper.