no code implementations • 18 Apr 2024 • Yufan Li, Subhabrata Sen, Ben Adlam
In the transfer learning paradigm models learn useful representations (or features) during a data-rich pretraining stage, and then use the pretrained representation to improve model performance on data-scarce downstream tasks.
no code implementations • 11 Dec 2023 • Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel
To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times.
no code implementations • 8 Nov 2023 • C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant, Peter J. Liu, Roman Novak, Yundi Qian, Noah Fiedel, Jascha Sohl-Dickstein
We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment.
no code implementations • 25 Sep 2023 • Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith
In this work, we seek ways to reproduce and study training stability and instability at smaller scales.
no code implementations • 9 Mar 2023 • Ben Adlam, Jaehoon Lee, Shreyas Padhy, Zachary Nado, Jasper Snoek
Using this approach, we study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset.
no code implementations • 21 Jun 2022 • Neha Gupta, Jamie Smith, Ben Adlam, Zelda Mariet
Empirically, standard ensembling reducesthe bias, leading us to hypothesize that ensembles of classifiers may perform well in part because of this unexpected reduction. We conclude by an empirical analysis of recent deep learning methods that ensemble over hyperparameters, revealing that these techniques indeed favor bias reduction.
no code implementations • 15 Jun 2022 • Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington
Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems.
no code implementations • 14 May 2022 • Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington
By analyzing homogenized SGD, we provide exact non-asymptotic high-dimensional expressions for the generalization performance of SGD in terms of a solution of a Volterra integral equation.
no code implementations • 8 Feb 2022 • Ben Adlam, Neha Gupta, Zelda Mariet, Jamie Smith
We show that, similarly to the label, the central prediction can be interpreted as the mean of a random variable, where the mean operates in a dual space defined by the loss function itself.
no code implementations • NeurIPS 2021 • Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington
A significant obstacle in the development of robust machine learning models is \emph{covariate shift}, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label distributions remain the same.
BIG-bench Machine Learning Out-of-Distribution Generalization +1
no code implementations • 16 Nov 2021 • Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington
A significant obstacle in the development of robust machine learning models is covariate shift, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label distributions remain the same.
BIG-bench Machine Learning Out-of-Distribution Generalization +2
no code implementations • ICLR 2021 • Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek
This gives us a better understanding of the implicit prior NNs place on function space and allows a direct comparison of the calibration of the NNGP and its finite-width analogue.
no code implementations • 6 Nov 2020 • Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne, Rajiv Raman, Kim Ramasamy, Rory Sayres, Jessica Schrouff, Martin Seneviratne, Shannon Sequeira, Harini Suresh, Victor Veitch, Max Vladymyrov, Xuezhi Wang, Kellie Webster, Steve Yadlowsky, Taedong Yun, Xiaohua Zhai, D. Sculley
Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains.
no code implementations • NeurIPS 2020 • Ben Adlam, Jeffrey Pennington
Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, with simpler models exhibiting high bias and more complex models exhibiting high variance of the predictive function.
1 code implementation • 14 Oct 2020 • Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek
This gives us a better understanding of the implicit prior NNs place on function space and allows a direct comparison of the calibration of the NNGP and its finite-width analogue.
no code implementations • ICML 2020 • Ben Adlam, Jeffrey Pennington
Modern deep learning models employ considerably more parameters than required to fit the training data.
no code implementations • 31 Jul 2020 • Ben Adlam, Jasper Snoek, Samuel L. Smith
Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set (the "cold posterior" effect).
no code implementations • NeurIPS 2020 • Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein
We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods.
no code implementations • NeurIPS 2020 • Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington
Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes.
no code implementations • 2 Dec 2019 • Ben Adlam, Jake Levinson, Jeffrey Pennington
In this work, we focus on this high-dimensional regime in which both the dataset size and the number of features tend to infinity.
no code implementations • 30 Oct 2019 • Ben Adlam, Charles Weill, Amol Kapoor
We investigate under and overfitting in Generative Adversarial Networks (GANs), using discriminators unseen by the generator to measure generalization.
no code implementations • NeurIPS 2019 • Ben Adlam, Corinna Cortes, Mehryar Mohri, Ningshan Zhang
Generative adversarial networks (GANs) generate data based on minimizing a divergence between two distributions.
no code implementations • 25 Sep 2019 • Ben Adlam, Jake Levinson, Jeffrey Pennington
One of the distinguishing characteristics of modern deep learning systems is that they typically employ neural network architectures that utilize enormous numbers of parameters, often in the millions and sometimes even in the billions.
1 code implementation • 30 Apr 2019 • Charles Weill, Javier Gonzalvo, Vitaly Kuznetsov, Scott Yang, Scott Yak, Hanna Mazzawi, Eugen Hotaj, Ghassen Jerfel, Vladimir Macko, Ben Adlam, Mehryar Mohri, Corinna Cortes
AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention.