Search Results for author: Yasaman Bahri

Found 13 papers, 8 papers with code

The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning

no code implementations30 Jun 2021 Anders Andreassen, Yasaman Bahri, Behnam Neyshabur, Rebecca Roelofs

Although machine learning models typically experience a drop in performance on out-of-distribution data, accuracies on in- versus out-of-distribution data are widely observed to follow a single linear trend when evaluated across a testbed of models.

Explaining Neural Scaling Laws

1 code implementation12 Feb 2021 Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, Utkarsh Sharma

In the large width limit, this can be equivalently obtained from the spectrum of certain kernels, and we present evidence that large width and large dataset resolution-limited scaling exponents are related by a duality.

The large learning rate phase of deep learning

1 code implementation1 Jan 2021 Aitor Lewkowycz, Yasaman Bahri, Ethan Dyer, Jascha Sohl-Dickstein, Guy Gur-Ari

In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks.

Exact posterior distributions of wide Bayesian neural networks

1 code implementation18 Jun 2020 Jiri Hron, Yasaman Bahri, Roman Novak, Jeffrey Pennington, Jascha Sohl-Dickstein

Recent work has shown that the prior over functions induced by a deep Bayesian neural network (BNN) behaves as a Gaussian process (GP) as the width of all layers becomes large.

Infinite attention: NNGP and NTK for deep attention networks

1 code implementation ICML 2020 Jiri Hron, Yasaman Bahri, Jascha Sohl-Dickstein, Roman Novak

There is a growing amount of literature on the relationship between wide neural networks (NNs) and Gaussian processes (GPs), identifying an equivalence between the two for a variety of NN architectures.

Deep Attention Gaussian Processes

The large learning rate phase of deep learning: the catapult mechanism

no code implementations4 Mar 2020 Aitor Lewkowycz, Yasaman Bahri, Ethan Dyer, Jascha Sohl-Dickstein, Guy Gur-Ari

In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks.

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

3 code implementations ICML 2018 Lechao Xiao, Yasaman Bahri, Jascha Sohl-Dickstein, Samuel S. Schoenholz, Jeffrey Pennington

In this work, we demonstrate that it is possible to train vanilla CNNs with ten thousand layers or more simply by using an appropriate initialization scheme.

Sensitivity and Generalization in Neural Networks: an Empirical Study

no code implementations ICLR 2018 Roman Novak, Yasaman Bahri, Daniel A. Abolafia, Jeffrey Pennington, Jascha Sohl-Dickstein

In practice it is often found that large over-parameterized neural networks generalize better than their smaller counterparts, an observation that appears to conflict with classical notions of function complexity, which typically favor smaller models.

Data Augmentation Image Classification

Deep Neural Networks as Gaussian Processes

5 code implementations ICLR 2018 Jaehoon Lee, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, Jascha Sohl-Dickstein

As such, previous work has not identified that these kernels can be used as covariance functions for GPs and allow fully Bayesian prediction with a deep neural network.

Bayesian Inference Gaussian Processes

Geometry of Neural Network Loss Surfaces via Random Matrix Theory

no code implementations ICML 2017 Jeffrey Pennington, Yasaman Bahri

We introduce an analytical framework and a set of tools from random matrix theory that allow us to compute an approximation of this distribution under a set of simplifying assumptions.

Cannot find the paper you are looking for? You can Submit a new open access paper.