Search Results for author: Guy Gur-Ari

Found 8 papers, 2 papers with code

The large learning rate phase of deep learning

1 code implementation1 Jan 2021 Aitor Lewkowycz, Yasaman Bahri, Ethan Dyer, Jascha Sohl-Dickstein, Guy Gur-Ari

In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks.

Are wider nets better given the same number of parameters?

2 code implementations ICLR 2021 Anna Golubeva, Behnam Neyshabur, Guy Gur-Ari

Empirical studies demonstrate that the performance of neural networks improves with increasing number of parameters.

On the training dynamics of deep networks with $L_2$ regularization

no code implementations NeurIPS 2020 Aitor Lewkowycz, Guy Gur-Ari

Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks.

Image Classification

On the asymptotics of wide networks with polynomial activations

no code implementations11 Jun 2020 Kyle Aitken, Guy Gur-Ari

We consider an existing conjecture addressing the asymptotic behavior of neural networks in the large width limit.

The large learning rate phase of deep learning: the catapult mechanism

no code implementations4 Mar 2020 Aitor Lewkowycz, Yasaman Bahri, Ethan Dyer, Jascha Sohl-Dickstein, Guy Gur-Ari

In the small learning rate phase, training can be understood using the existing theory of infinitely wide neural networks.

Asymptotics of Wide Networks from Feynman Diagrams

no code implementations ICLR 2020 Ethan Dyer, Guy Gur-Ari

Understanding the asymptotic behavior of wide networks is of considerable interest.

Wider Networks Learn Better Features

no code implementations25 Sep 2019 Dar Gilboa, Guy Gur-Ari

Transferability of learned features between tasks can massively reduce the cost of training a neural network on a novel task.

Gradient Descent Happens in a Tiny Subspace

no code implementations ICLR 2019 Guy Gur-Ari, Daniel A. Roberts, Ethan Dyer

We show that in a variety of large-scale deep learning scenarios the gradient dynamically converges to a very small subspace after a short period of training.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.