Search Results for author: Ben Adlam

Found 24 papers, 2 papers with code

Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis

no code implementations • 18 Apr 2024 • Yufan Li, Subhabrata Sen, Ben Adlam

In the transfer learning paradigm models learn useful representations (or features) during a data-rich pretraining stage, and then use the pretrained representation to improve model performance on data-scarce downstream tasks.

Transfer Learning

Paper
Add Code

Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models

no code implementations • 11 Dec 2023 • Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel

To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times.

Math

Paper
Add Code

Frontier Language Models are not Robust to Adversarial Arithmetic, or "What do I need to say so you agree 2+2=5?

no code implementations • 8 Nov 2023 • C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant, Peter J. Liu, Roman Novak, Yundi Qian, Noah Fiedel, Jascha Sohl-Dickstein

We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment.

Language Modelling

Paper
Add Code

Small-scale proxies for large-scale Transformer training instabilities

no code implementations • 25 Sep 2023 • Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith

In this work, we seek ways to reproduce and study training stability and instability at smaller scales.

Paper
Add Code

Kernel Regression with Infinite-Width Neural Networks on Millions of Examples

no code implementations • 9 Mar 2023 • Ben Adlam, Jaehoon Lee, Shreyas Padhy, Zachary Nado, Jasper Snoek

Using this approach, we study scaling laws of several neural kernels across many orders of magnitude for the CIFAR-5m dataset.

Data Augmentation regression

Paper
Add Code

Ensembling over Classifiers: a Bias-Variance Perspective

no code implementations • 21 Jun 2022 • Neha Gupta, Jamie Smith, Ben Adlam, Zelda Mariet

Empirically, standard ensembling reducesthe bias, leading us to hypothesize that ensembles of classifiers may perform well in part because of this unexpected reduction. We conclude by an empirical analysis of recent deep learning methods that ensemble over hyperparameters, revealing that these techniques indeed favor bias reduction.

Paper
Add Code

Implicit Regularization or Implicit Conditioning? Exact Risk Trajectories of SGD in High Dimensions

no code implementations • 15 Jun 2022 • Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

Stochastic gradient descent (SGD) is a pillar of modern machine learning, serving as the go-to optimization algorithm for a diverse array of problems.

Computational Efficiency

Paper
Add Code

Homogenization of SGD in high-dimensions: Exact dynamics and generalization properties

no code implementations • 14 May 2022 • Courtney Paquette, Elliot Paquette, Ben Adlam, Jeffrey Pennington

By analyzing homogenized SGD, we provide exact non-asymptotic high-dimensional expressions for the generalization performance of SGD in terms of a solution of a Volterra integral equation.

Vocal Bursts Intensity Prediction

Paper
Add Code

Understanding the bias-variance tradeoff of Bregman divergences

no code implementations • 8 Feb 2022 • Ben Adlam, Neha Gupta, Zelda Mariet, Jamie Smith

We show that, similarly to the label, the central prediction can be interpreted as the mean of a random variable, where the mean operates in a dual space defined by the loss function itself.

Paper
Add Code

Overparameterization Improves Robustness to Covariate Shift in High Dimensions

no code implementations • NeurIPS 2021 • Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington

A significant obstacle in the development of robust machine learning models is \emph{covariate shift}, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label distributions remain the same.

BIG-bench Machine Learning Out-of-Distribution Generalization +1

Paper
Add Code

Covariate Shift in High-Dimensional Random Feature Regression

no code implementations • 16 Nov 2021 • Nilesh Tripuraneni, Ben Adlam, Jeffrey Pennington

A significant obstacle in the development of robust machine learning models is covariate shift, a form of distribution shift that occurs when the input distributions of the training and test sets differ while the conditional label distributions remain the same.

BIG-bench Machine Learning Out-of-Distribution Generalization +2

Paper
Add Code

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit

no code implementations • ICLR 2021 • Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek

This gives us a better understanding of the implicit prior NNs place on function space and allows a direct comparison of the calibration of the NNGP and its finite-width analogue.

General Classification Multi-class Classification +1

Paper
Add Code

Underspecification Presents Challenges for Credibility in Modern Machine Learning

no code implementations • 6 Nov 2020 • Alexander D'Amour, Katherine Heller, Dan Moldovan, Ben Adlam, Babak Alipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D. Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, Andrea Montanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne, Rajiv Raman, Kim Ramasamy, Rory Sayres, Jessica Schrouff, Martin Seneviratne, Shannon Sequeira, Harini Suresh, Victor Veitch, Max Vladymyrov, Xuezhi Wang, Kellie Webster, Steve Yadlowsky, Taedong Yun, Xiaohua Zhai, D. Sculley

Predictors returned by underspecified pipelines are often treated as equivalent based on their training domain performance, but we show here that such predictors can behave very differently in deployment domains.

BIG-bench Machine Learning

Paper
Add Code

Understanding Double Descent Requires a Fine-Grained Bias-Variance Decomposition

no code implementations • NeurIPS 2020 • Ben Adlam, Jeffrey Pennington

Classical learning theory suggests that the optimal generalization performance of a machine learning model should occur at an intermediate model complexity, with simpler models exhibiting high bias and more complex models exhibiting high variance of the predictive function.

Ensemble Learning Learning Theory

Paper
Add Code

Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit

1 code implementation • 14 Oct 2020 • Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek

This gives us a better understanding of the implicit prior NNs place on function space and allows a direct comparison of the calibration of the NNGP and its finite-width analogue.

General Classification Multi-class Classification +1

1,365

Paper
Code

The Neural Tangent Kernel in High Dimensions: Triple Descent and a Multi-Scale Theory of Generalization

no code implementations • ICML 2020 • Ben Adlam, Jeffrey Pennington

Modern deep learning models employ considerably more parameters than required to fit the training data.

regression

Paper
Add Code

Cold Posteriors and Aleatoric Uncertainty

no code implementations • 31 Jul 2020 • Ben Adlam, Jasper Snoek, Samuel L. Smith

Recent work has observed that one can outperform exact inference in Bayesian neural networks by tuning the "temperature" of the posterior on a validation set (the "cold posterior" effect).

valid

Paper
Add Code

Finite Versus Infinite Neural Networks: an Empirical Study

no code implementations • NeurIPS 2020 • Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods.

Paper
Add Code

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

no code implementations • NeurIPS 2020 • Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington

Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes.

Paper
Add Code

A Random Matrix Perspective on Mixtures of Nonlinearities for Deep Learning

no code implementations • 2 Dec 2019 • Ben Adlam, Jake Levinson, Jeffrey Pennington

In this work, we focus on this high-dimensional regime in which both the dataset size and the number of features tend to infinity.

Paper
Add Code

Investigating Under and Overfitting in Wasserstein Generative Adversarial Networks

no code implementations • 30 Oct 2019 • Ben Adlam, Charles Weill, Amol Kapoor

We investigate under and overfitting in Generative Adversarial Networks (GANs), using discriminators unseen by the generator to measure generalization.

Paper
Add Code

Learning GANs and Ensembles Using Discrepancy

no code implementations • NeurIPS 2019 • Ben Adlam, Corinna Cortes, Mehryar Mohri, Ningshan Zhang

Generative adversarial networks (GANs) generate data based on minimizing a divergence between two distributions.

Domain Adaptation

Paper
Add Code

A Random Matrix Perspective on Mixtures of Nonlinearities in High Dimensions

no code implementations • 25 Sep 2019 • Ben Adlam, Jake Levinson, Jeffrey Pennington

One of the distinguishing characteristics of modern deep learning systems is that they typically employ neural network architectures that utilize enormous numbers of parameters, often in the millions and sometimes even in the billions.

Vocal Bursts Intensity Prediction

Paper
Add Code

AdaNet: A Scalable and Flexible Framework for Automatically Learning Ensembles

1 code implementation • 30 Apr 2019 • Charles Weill, Javier Gonzalvo, Vitaly Kuznetsov, Scott Yang, Scott Yak, Hanna Mazzawi, Eugen Hotaj, Ghassen Jerfel, Vladimir Macko, Ben Adlam, Mehryar Mohri, Corinna Cortes

AdaNet is a lightweight TensorFlow-based (Abadi et al., 2015) framework for automatically learning high-quality ensembles with minimal expert intervention.

Neural Architecture Search

3,470

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.