Search Results for author: Greg Yang

Found 37 papers, 15 papers with code

A Spectral Condition for Feature Learning

no code implementations • 26 Oct 2023 • Greg Yang, James B. Simon, Jeremy Bernstein

The push to train ever larger neural networks has motivated the study of initialization and training at large network width.

Paper
Add Code

Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks

no code implementations • 3 Oct 2023 • Greg Yang, Dingli Yu, Chen Zhu, Soufiane Hayou

By classifying infinite-width neural networks and identifying the *optimal* limit, Tensor Programs IV and V demonstrated a universal way, called $\mu$P, for *widthwise hyperparameter transfer*, i. e., predicting optimal hyperparameters of wide neural networks from narrow ones.

Paper
Add Code

Tensor Programs IVb: Adaptive Optimization in the Infinite-Width Limit

no code implementations • 3 Aug 2023 • Greg Yang, Etai Littwin

Going beyond stochastic gradient descent (SGD), what new phenomena emerge in wide neural networks trained by adaptive optimizers like Adam?

Paper
Add Code

Width and Depth Limits Commute in Residual Networks

no code implementations • 1 Feb 2023 • Soufiane Hayou, Greg Yang

We show that taking the width and depth to infinity in a deep neural network with skip connections, when branches are scaled by $1/\sqrt{depth}$ (the only nontrivial scaling), result in the same covariance structure no matter how that limit is taken.

Paper
Add Code

High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation

no code implementations • 3 May 2022 • Jimmy Ba, Murat A. Erdogdu, Taiji Suzuki, Zhichao Wang, Denny Wu, Greg Yang

We study the first gradient descent step on the first-layer parameters $\boldsymbol{W}$ in a two-layer neural network: $f(\boldsymbol{x}) = \frac{1}{\sqrt{N}}\boldsymbol{a}^\top\sigma(\boldsymbol{W}^\top\boldsymbol{x})$, where $\boldsymbol{W}\in\mathbb{R}^{d\times N}, \boldsymbol{a}\in\mathbb{R}^{N}$ are randomly initialized, and the training objective is the empirical MSE loss: $\frac{1}{n}\sum_{i=1}^n (f(\boldsymbol{x}_i)-y_i)^2$.

Paper
Add Code

Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer

3 code implementations • 7 Mar 2022 • Greg Yang, Edward J. Hu, Igor Babuschkin, Szymon Sidor, Xiaodong Liu, David Farhi, Nick Ryder, Jakub Pachocki, Weizhu Chen, Jianfeng Gao

Hyperparameter (HP) tuning in deep learning is an expensive process, prohibitively so for neural networks (NNs) with billions of parameters.

1,168

Paper
Code

CLUES: Few-Shot Learning Evaluation in Natural Language Understanding

1 code implementation • 4 Nov 2021 • Subhabrata Mukherjee, Xiaodong Liu, Guoqing Zheng, Saghar Hosseini, Hao Cheng, Greg Yang, Christopher Meek, Ahmed Hassan Awadallah, Jianfeng Gao

We demonstrate that while recent models reach human performance when they have access to large amounts of labeled data, there is a huge gap in performance in the few-shot setting for most tasks.

Few-Shot Learning Natural Language Understanding

Paper
Code

Efficient Computation of Deep Nonlinear Infinite-Width Neural Networks that Learn Features

no code implementations • ICLR 2022 • Greg Yang, Michael Santacroce, Edward J Hu

While a popular limit of infinite-width neural networks, the Neural Tangent Kernel (NTK) often exhibits performance gaps from finite-width neural networks on standard datasets, due to lack of feature learning.

Paper
Add Code

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

no code implementations • 1 Jul 2021 • Etai Littwin, Omid Saremi, Shuangfei Zhai, Vimal Thilak, Hanlin Goh, Joshua M. Susskind, Greg Yang

We analyze the learning dynamics of infinitely wide neural networks with a finite sized bottle-neck.

Paper
Add Code

3DB: A Framework for Debugging Computer Vision Models

1 code implementation • 7 Jun 2021 • Guillaume Leclerc, Hadi Salman, Andrew Ilyas, Sai Vemprala, Logan Engstrom, Vibhav Vineet, Kai Xiao, Pengchuan Zhang, Shibani Santurkar, Greg Yang, Ashish Kapoor, Aleksander Madry

We introduce 3DB: an extendable, unified framework for testing and debugging vision models using photorealistic simulation.

123

Paper
Code

Tensor Programs IIb: Architectural Universality of Neural Tangent Kernel Training Dynamics

no code implementations • 8 May 2021 • Greg Yang, Etai Littwin

To facilitate this proof, we develop a graphical notation for Tensor Programs.

Paper
Add Code

Feature Learning in Infinite-Width Neural Networks

4 code implementations • 30 Nov 2020 • Greg Yang, Edward J. Hu

However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn features, which is crucial for pretraining and transfer learning such as with BERT.

Few-Shot Learning Transfer Learning

1,168

Paper
Code

Tensor Programs III: Neural Matrix Laws

no code implementations • 22 Sep 2020 • Greg Yang

FIP and these results hold for any neural architecture.

Paper
Add Code

Tensor Programs II: Neural Tangent Kernel for Any Architecture

2 code implementations • 25 Jun 2020 • Greg Yang

We prove that a randomly initialized neural network of *any architecture* has its Tangent Kernel (NTK) converge to a deterministic limit, as the network widths tend to infinity.

232

Paper
Code

Improved Image Wasserstein Attacks and Defenses

1 code implementation • 26 Apr 2020 • Edward J. Hu, Adith Swaminathan, Hadi Salman, Greg Yang

Robustness against image perturbations bounded by a $\ell_p$ ball have been well-studied in recent literature.

Paper
Code

On Infinite-Width Hypernetworks

1 code implementation • NeurIPS 2020 • Etai Littwin, Tomer Galanti, Lior Wolf, Greg Yang

{\em Hypernetworks} are architectures that produce the weights of a task-specific {\em primary network}.

Meta-Learning

Paper
Code

Denoised Smoothing: A Provable Defense for Pretrained Classifiers

4 code implementations • NeurIPS 2020 • Hadi Salman, Ming-Jie Sun, Greg Yang, Ashish Kapoor, J. Zico Kolter

We present a method for provably defending any pretrained image classifier against $\ell_p$ adversarial attacks.

General Classification Image Classification +1

Paper
Code

Randomized Smoothing of All Shapes and Sizes

1 code implementation • ICML 2020 • Greg Yang, Tony Duan, J. Edward Hu, Hadi Salman, Ilya Razenshteyn, Jerry Li

Randomized smoothing is the current state-of-the-art defense with provable robustness against $\ell_2$ adversarial attacks.

Paper
Code

Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

1 code implementation • NeurIPS 2019 • Greg Yang

Wide neural networks with random weights and biases are Gaussian processes, as observed by Neal (1995) for shallow networks, and more recently by Lee et al.~(2018) and Matthews et al.~(2018) for deep fully-connected networks, as well as by Novak et al.~(2019) and Garriga-Alonso et al.~(2019) for deep convolutional networks.

Gaussian Processes

232

Paper
Code

Tensor Programs I: Wide Feedforward or Recurrent Neural Networks of Any Architecture are Gaussian Processes

2 code implementations • 28 Oct 2019 • Greg Yang

Wide neural networks with random weights and biases are Gaussian processes, as originally observed by Neal (1995) and more recently by Lee et al. (2018) and Matthews et al. (2018) for deep fully-connected networks, as well as by Novak et al. (2019) and Garriga-Alonso et al. (2019) for deep convolutional networks.

Gaussian Processes

232

Paper
Code

The Dynamics of Signal Propagation in Gated Recurrent Neural Networks

no code implementations • 25 Sep 2019 • Dar Gilboa, Bo Chang, Minmin Chen, Greg Yang, Samuel S. Schoenholz, Ed H. Chi, Jeffrey Pennington

We demonstrate the efficacy of our initialization scheme on multiple sequence tasks, on which it enables successful training while a standard initialization either fails completely or is orders of magnitude slower.

Paper
Add Code

Free resolutions of function classes via order complexes

no code implementations • 5 Sep 2019 • Justin Chen, Christopher Eur, Greg Yang, Mengyuan Zhang

Function classes are collections of Boolean functions on a finite set, which are fundamental objects of study in theoretical computer science.

Learning Theory

Paper
Add Code

A Fine-Grained Spectral Perspective on Neural Networks

1 code implementation • 24 Jul 2019 • Greg Yang, Hadi Salman

Are neural networks biased toward simple functions?

Paper
Code

Provably Robust Deep Learning via Adversarially Trained Smoothed Classifiers

3 code implementations • NeurIPS 2019 • Hadi Salman, Greg Yang, Jerry Li, Pengchuan Zhang, huan zhang, Ilya Razenshteyn, Sebastien Bubeck

In this paper, we employ adversarial training to improve the performance of randomized smoothing.

Adversarial Attack Adversarial Defense

220

Paper
Code

Deep Bayesian Convolutional Networks with Many Channels are Gaussian Processes

no code implementations • ICLR 2019 • Roman Novak, Lechao Xiao, Yasaman Bahri, Jaehoon Lee, Greg Yang, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs).

Gaussian Processes

Paper
Add Code

A Convex Relaxation Barrier to Tight Robustness Verification of Neural Networks

3 code implementations • NeurIPS 2019 • Hadi Salman, Greg Yang, huan zhang, Cho-Jui Hsieh, Pengchuan Zhang

This framework works for neural networks with diverse architectures and nonlinearities and covers both primal and dual views of robustness verification.

205

Paper
Code

A Mean Field Theory of Batch Normalization

no code implementations • ICLR 2019 • Greg Yang, Jeffrey Pennington, Vinay Rao, Jascha Sohl-Dickstein, Samuel S. Schoenholz

We develop a mean field theory for batch normalization in fully-connected feedforward neural networks.

Paper
Add Code

Scaling Limits of Wide Neural Networks with Weight Sharing: Gaussian Process Behavior, Gradient Independence, and Neural Tangent Kernel Derivation

no code implementations • 13 Feb 2019 • Greg Yang

Several recent trends in machine learning theory and practice, from the design of state-of-the-art Gaussian Process to the convergence analysis of deep neural nets (DNNs) under stochastic gradient descent (SGD), have found it fruitful to study wide random neural networks.

Gaussian Processes Learning Theory

Paper
Add Code

NAIL: A General Interactive Fiction Agent

1 code implementation • 12 Feb 2019 • Matthew Hausknecht, Ricky Loynd, Greg Yang, Adith Swaminathan, Jason D. Williams

Interactive Fiction (IF) games are complex textual decision making problems.

Decision Making

Paper
Code

Dynamical Isometry and a Mean Field Theory of LSTMs and GRUs

no code implementations • 25 Jan 2019 • Dar Gilboa, Bo Chang, Minmin Chen, Greg Yang, Samuel S. Schoenholz, Ed H. Chi, Jeffrey Pennington

Paper
Add Code

Bayesian Deep Convolutional Networks with Many Channels are Gaussian Processes

no code implementations • 11 Oct 2018 • Roman Novak, Lechao Xiao, Jaehoon Lee, Yasaman Bahri, Greg Yang, Jiri Hron, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

There is a previously identified equivalence between wide fully connected neural networks (FCNs) and Gaussian processes (GPs).

Gaussian Processes

Paper
Add Code

Deep Mean Field Theory: Layerwise Variance and Width Variation as Methods to Control Gradient Explosion

no code implementations • ICLR 2018 • Greg Yang, Sam S. Schoenholz

Using the obtained mean field theory, we are able to track surprisingly well how VV at initialization time affects training and test time performance on MNIST after a set number of epochs: the level sets of test/train set accuracies coincide with the level sets of the expectations of certain gradient norms or of metric expressivity (as defined in \cite{yang_meanfield_2017}), a measure of expansion in a random neural network.

Paper
Add Code

Mean Field Residual Networks: On the Edge of Chaos

no code implementations • NeurIPS 2017 • Greg Yang, Samuel S. Schoenholz

Classical feedforward neural networks, such as those with tanh activations, exhibit exponential behavior on the average when propagating inputs forward or gradients backward.

Paper
Add Code

A Homological Theory of Functions

no code implementations • 9 Jan 2017 • Greg Yang

In computational complexity, a complexity class is given by a set of problems or functions, and a basic challenge is to show separations of complexity classes $A \not= B$ especially when $A$ is known to be a subset of $B$.

LEMMA

Paper
Add Code

Lie-Access Neural Turing Machines

no code implementations • 9 Nov 2016 • Greg Yang, Alexander M. Rush

The head is moved via Lie group actions, such as shifts or rotations, generated by a controller, and memory access is performed by linear smoothing in key space.

Paper
Add Code

Lie Access Neural Turing Machine

no code implementations • 28 Feb 2016 • Greg Yang

We found the right configuration of LANTM to outperform the baseline in all of our experiments.

Paper
Add Code

Computabilities of Validity and Satisfiability in Probability Logics over Finite and Countable Models

no code implementations • 12 Oct 2014 • Greg Yang

In addition, most of the results, of this paper and of Kuyper and Terwijn, do not apply to individual languages with a finite number of unary predicates.

Learning Theory

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.