Search Results for author: Lechao Xiao

Found 25 papers, 8 papers with code

Rethinking Conventional Wisdom in Machine Learning: From Generalization to Scaling

no code implementations23 Sep 2024 Lechao Xiao

This raises a critical question: Do the established principles that proved successful in the generalization-centric era remain valid in this new era of scaling?

L2 Regularization Language Modelling +1

Scaling Exponents Across Parameterizations and Optimizers

1 code implementation8 Jul 2024 Katie Everett, Lechao Xiao, Mitchell Wortsman, Alexander A. Alemi, Roman Novak, Peter J. Liu, Izzeddin Gur, Jascha Sohl-Dickstein, Leslie Pack Kaelbling, Jaehoon Lee, Jeffrey Pennington

Robust and effective scaling of models from small to large width typically requires the precise adjustment of many algorithmic and architectural details, such as parameterization and optimizer choices.

4+3 Phases of Compute-Optimal Neural Scaling Laws

no code implementations23 May 2024 Elliot Paquette, Courtney Paquette, Lechao Xiao, Jeffrey Pennington

We consider the solvable neural scaling model with three parameters: data complexity, target complexity, and model-parameter-count.

Fast Neural Kernel Embeddings for General Activations

2 code implementations9 Sep 2022 Insu Han, Amir Zandieh, Jaehoon Lee, Roman Novak, Lechao Xiao, Amin Karbasi

Moreover, most prior works on neural kernels have focused on the ReLU activation, mainly due to its popularity but also due to the difficulty of computing such kernels for general activations.

Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm

no code implementations11 Jul 2022 Lechao Xiao, Jeffrey Pennington

Although learning in high dimensions is commonly believed to suffer from the curse of dimensionality, modern machine learning methods often exhibit an astonishing power to tackle a wide range of challenging real-world learning problems without using abundant amounts of data.

Open-Ended Question Answering Triplet

Precise Learning Curves and Higher-Order Scaling Limits for Dot Product Kernel Regression

no code implementations30 May 2022 Lechao Xiao, Hong Hu, Theodor Misiakiewicz, Yue M. Lu, Jeffrey Pennington

As modern machine learning models continue to advance the computational frontier, it has become increasingly important to develop precise estimates for expected performance improvements under different model and data scaling regimes.

regression

Eigenspace Restructuring: a Principle of Space and Frequency in Neural Networks

no code implementations10 Dec 2021 Lechao Xiao

It is well-known that the eigenstructure of infinite-width multilayer perceptrons (MLPs) depends solely on the concept frequency, which measures the order of interactions.

Dataset Distillation with Infinitely Wide Convolutional Networks

2 code implementations NeurIPS 2021 Timothy Nguyen, Roman Novak, Lechao Xiao, Jaehoon Lee

The effectiveness of machine learning algorithms arises from being able to extract useful features from large amounts of data.

Dataset Distillation Image Classification +1

What Breaks The Curse of Dimensionality in Deep Learning?

no code implementations NeurIPS 2021 Lechao Xiao, Jeffrey Pennington

By computing an eigen-decomposition of the infinite-width limits (aka Neural Kernels) of these architectures, we characterize how inductive biases (locality, weight-sharing, pooling, etc) and the breaking of spurious symmetries can affect the performance of these learning systems.

Deep Learning Open-Ended Question Answering

Exploring the Uncertainty Properties of Neural Networks’ Implicit Priors in the Infinite-Width Limit

no code implementations ICLR 2021 Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek

This gives us a better understanding of the implicit prior NNs place on function space and allows a direct comparison of the calibration of the NNGP and its finite-width analogue.

General Classification Multi-class Classification +1

Exploring the Uncertainty Properties of Neural Networks' Implicit Priors in the Infinite-Width Limit

1 code implementation14 Oct 2020 Ben Adlam, Jaehoon Lee, Lechao Xiao, Jeffrey Pennington, Jasper Snoek

This gives us a better understanding of the implicit prior NNs place on function space and allows a direct comparison of the calibration of the NNGP and its finite-width analogue.

General Classification Multi-class Classification +1

Finite Versus Infinite Neural Networks: an Empirical Study

no code implementations NeurIPS 2020 Jaehoon Lee, Samuel S. Schoenholz, Jeffrey Pennington, Ben Adlam, Lechao Xiao, Roman Novak, Jascha Sohl-Dickstein

We perform a careful, thorough, and large scale empirical study of the correspondence between wide neural networks and kernel methods.

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

no code implementations NeurIPS 2020 Wei Hu, Lechao Xiao, Ben Adlam, Jeffrey Pennington

Modern neural networks are often regarded as complex black-box functions whose behavior is difficult to understand owing to their nonlinear dependence on the data and the nonconvexity in their loss landscapes.

Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear Networks

no code implementations ICLR 2020 Wei Hu, Lechao Xiao, Jeffrey Pennington

The selection of initial parameter values for gradient-based optimization of deep neural networks is one of the most impactful hyperparameter choices in deep learning systems, affecting both convergence times and model performance.

Disentangling Trainability and Generalization in Deep Neural Networks

no code implementations ICML 2020 Lechao Xiao, Jeffrey Pennington, Samuel S. Schoenholz

A longstanding goal in the theory of deep learning is to characterize the conditions under which a given neural network architecture will be trainable, and if so, how well it might generalize to unseen data.

Gaussian Processes

Disentangling Trainability and Generalization in Deep Learning

no code implementations25 Sep 2019 Lechao Xiao, Jeffrey Pennington, Sam Schoenholz

In this paper, we discuss these challenging issues in the context of wide neural networks at large depths where we will see that the situation simplifies considerably.

Deep Learning Gaussian Processes

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

3 code implementations ICML 2018 Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S. Schoenholz, Jeffrey Pennington

In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme.

Cannot find the paper you are looking for? You can Submit a new open access paper.