no code implementations • 23 Sep 2024 • Lechao Xiao
This raises a critical question: Do the established principles that proved successful in the generalization-centric era remain valid in this new era of scaling?
no code implementations • 14 Aug 2024 • Jiri Hron, Laura Culp, Gamaleldin Elsayed, Rosanne Liu, Ben Adlam, Maxwell Bileschi, Bernd Bohnet, JD Co-Reyes, Noah Fiedel, C. Daniel Freeman, Izzeddin Gur, Kathleen Kenealy, Jaehoon Lee, Peter J. Liu, Gaurav Mishra, Igor Mordatch, Azade Nova, Roman Novak, Aaron Parisi, Jeffrey Pennington, Alex Rizkowsky, Isabelle Simpson, Hanie Sedghi, Jascha Sohl-Dickstein, Kevin Swersky, Sharad Vikram, Tris Warkentin, Lechao Xiao, Kelvin Xu, Jasper Snoek, Simon Kornblith
We find that for a fixed dataset, larger and longer-trained LMs hallucinate less.
1 code implementation • 8 Jul 2024 • Katie Everett, Lechao Xiao, Mitchell Wortsman, Alexander A. Alemi, Roman Novak, Peter J. Liu, Izzeddin Gur, Jascha Sohl-Dickstein, Leslie Pack Kaelbling, Jaehoon Lee, Jeffrey Pennington
Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices.
no code implementations • 23 May 2024 • Elliot Paquette, Courtney Paquette, Lechao Xiao, Jeffrey Pennington
We consider the solvable neural scaling model with three parameters: data complexity, target complexity, and model-parameter-count.
no code implementations • 11 Dec 2023 • Avi Singh, John D. Co-Reyes, Rishabh Agarwal, Ankesh Anand, Piyush Patil, Xavier Garcia, Peter J. Liu, James Harrison, Jaehoon Lee, Kelvin Xu, Aaron Parisi, Abhishek Kumar, Alex Alemi, Alex Rizkowsky, Azade Nova, Ben Adlam, Bernd Bohnet, Gamaleldin Elsayed, Hanie Sedghi, Igor Mordatch, Isabelle Simpson, Izzeddin Gur, Jasper Snoek, Jeffrey Pennington, Jiri Hron, Kathleen Kenealy, Kevin Swersky, Kshiteej Mahajan, Laura Culp, Lechao Xiao, Maxwell L. Bileschi, Noah Constant, Roman Novak, Rosanne Liu, Tris Warkentin, Yundi Qian, Yamini Bansal, Ethan Dyer, Behnam Neyshabur, Jascha Sohl-Dickstein, Noah Fiedel
To do so, we investigate a simple self-training method based on expectation-maximization, which we call ReST$^{EM}$, where we (1) generate samples from the model and filter them using binary feedback, (2) fine-tune the model on these samples, and (3) repeat this process a few times.
no code implementations • 8 Nov 2023 • C. Daniel Freeman, Laura Culp, Aaron Parisi, Maxwell L Bileschi, Gamaleldin F Elsayed, Alex Rizkowsky, Isabelle Simpson, Alex Alemi, Azade Nova, Ben Adlam, Bernd Bohnet, Gaurav Mishra, Hanie Sedghi, Igor Mordatch, Izzeddin Gur, Jaehoon Lee, JD Co-Reyes, Jeffrey Pennington, Kelvin Xu, Kevin Swersky, Kshiteej Mahajan, Lechao Xiao, Rosanne Liu, Simon Kornblith, Noah Constant, Peter J. Liu, Roman Novak, Yundi Qian, Noah Fiedel, Jascha Sohl-Dickstein
We introduce and study the problem of adversarial arithmetic, which provides a simple yet challenging testbed for language model alignment.
1 code implementation • 25 Sep 2023 • Mitchell Wortsman, Peter J. Liu, Lechao Xiao, Katie Everett, Alex Alemi, Ben Adlam, John D. Co-Reyes, Izzeddin Gur, Abhishek Kumar, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein, Kelvin Xu, Jaehoon Lee, Justin Gilmer, Simon Kornblith
In this work, we seek ways to reproduce and study training stability and instability at smaller scales.
2 code implementations • 9 Sep 2022 • Insu Han, Amir Zandieh, Jaehoon Lee, Roman Novak, Lechao Xiao, Amin Karbasi
Moreover, most prior works on neural kernels have focused on the ReLU activation, mainly due to its popularity but also due to the difficulty of computing such kernels for general activations.
no code implementations • 11 Jul 2022 • Lechao Xiao, Jeffrey Pennington
Although learning in high dimensions is commonly believed to suffer from the curse of dimensionality, modern machine learning methods often exhibit an astonishing power to tackle a wide range of challenging real-world learning problems without using abundant amounts of data.
no code implementations • 30 May 2022 • Lechao Xiao, Hong Hu, Theodor Misiakiewicz, Yue M. Lu, Jeffrey Pennington
As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes.
no code implementations • 10 Dec 2021 • Lechao Xiao
It is well-known that the eigenstructure of infinite-width multilayer perceptrons (MLPs) depends solely on the concept frequency, which measures the order of interactions.
2 code implementations • NeurIPS 2021 • Timothy Nguyen, Roman Novak, Lechao Xiao, Jaehoon Lee
The effectiveness of machine learning algorithms arises from being able to extract useful features from large amounts of data.
no code implementations • NeurIPS 2021 • Lechao Xiao, Jeffrey Pennington
By computing an eigen-decomposition of the infinite-width limits (aka Neural Kernels) of these architectures, we characterize how inductive biases (locality, weight-sharing, pooling, etc) and the breaking of spurious symmetries can affect the performance of these learning systems.
no code implementations • ICLR 2021 • Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek
This gives us a better understanding of the implicit prior NNs place on function space and allows a direct comparison of the calibration of the NNGP and its finite-width analogue.
1 code implementation • 14 Oct 2020 • Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek
This gives us a better understanding of the implicit prior NNs place on function space and allows a direct comparison of the calibration of the NNGP and its finite-width analogue.
no code implementations • NeurIPS 2020 • Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein
We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods.
no code implementations • NeurIPS 2020 • Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington
Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes.
no code implementations • ICLR 2020 • Wei Hu, Lechao Xiao, Jeffrey Pennington
The selection of initial parameter values for gradient-based optimization of deep neural networks is one of the most impactful hyperparameter choices in deep learning systems, affecting both convergence times and model performance.
no code implementations • ICML 2020 • Lechao Xiao, Jeffrey Pennington, Samuel S. Schoenholz
A longstanding goal in the theory of deep learning is to characterize the conditions under which a given neural network architecture will be trainable, and if so, how well it might generalize to unseen data.
3 code implementations • ICLR 2020 • Roman Novak, Lechao Xiao, Jiri Hron, Jaehoon Lee, Alexander A. Alemi, Jascha Sohl-Dickstein, Samuel S. Schoenholz
Neural Tangents is a library designed to enable research into infinite-width neural networks.
no code implementations • 25 Sep 2019 • Lechao Xiao, Jeffrey Pennington, Sam Schoenholz
In this paper, we discuss these challenging issues in the context of wide neural networks at large depths where we will see that the situation simplifies considerably.
no code implementations • ICLR 2019 • Roman Novak, Lechao Xiao, Yasaman Bahri, Jaehoon Lee, Greg Yang, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein
There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs).
1 code implementation • NeurIPS 2019 • Jaehoon Lee, Lechao Xiao, Samuel S. Schoenholz, Yasaman Bahri, Roman Novak, Jascha Sohl-Dickstein, Jeffrey Pennington
A longstanding goal in deep learning research has been to precisely characterize training and generalization.
no code implementations • 11 Oct 2018 • Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Greg Yang, Jiri Hron, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein
There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs).
3 code implementations • ICML 2018 • Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S. Schoenholz, Jeffrey Pennington
In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme.