1 code implementation • 26 May 2022 • Emmanuel Abbe, Samy Bengio, Elisabetta Cornacchia, Jon Kleinberg, Aryo Lotfi, Maithra Raghu, Chiyuan Zhang
More generally, the paper considers the learning of logical functions with gradient descent (GD) on neural networks.
1 code implementation • 15 Feb 2022 • Thao Nguyen, Maithra Raghu, Simon Kornblith
Recent work has uncovered a striking phenomenon in large-capacity neural networks: they contain blocks of contiguous hidden layers with highly similar representations.
no code implementations • 29 Sep 2021 • Thao Nguyen, Maithra Raghu, Simon Kornblith
Recent work has uncovered a striking phenomenon in large-capacity neural networks: they contain blocks of contiguous hidden layers with highly similar representations.
4 code implementations • NeurIPS 2021 • Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, Alexey Dosovitskiy
Finally, we study the effect of (pretraining) dataset scale on intermediate features and transfer learning, and conclude with a discussion on connections to new architectures such as the MLP-Mixer.
2 code implementations • 27 Jul 2021 • Chiyuan Zhang, Maithra Raghu, Jon Kleinberg, Samy Bengio
In PVR, this is done by having one part of the task input act as a pointer, giving instructions on a different input location, which forms the output.
1 code implementation • ICLR 2021 • Aniruddh Raghu, Maithra Raghu, Simon Kornblith, David Duvenaud, Geoffrey Hinton
We find that commentaries can improve training speed and/or performance, and provide insights about the dataset and training process.
4 code implementations • ICLR 2021 • Thao Nguyen, Maithra Raghu, Simon Kornblith
We begin by investigating how varying depth and width affects model hidden representations, finding a characteristic block structure in the hidden representations of larger capacity (wider or deeper) models.
no code implementations • ICLR 2021 • Vinay V. Ramasesh, Ethan Dyer, Maithra Raghu
A central challenge in developing versatile machine learning systems is catastrophic forgetting: a model trained on tasks in sequence will suffer significant performance drops on earlier tasks.
1 code implementation • 26 Mar 2020 • Maithra Raghu, Eric Schmidt
Over the past few years, we have seen fundamental breakthroughs in core problems in machine learning, largely driven by advances in deep neural networks.
2 code implementations • ICLR 2020 • Aniruddh Raghu, Maithra Raghu, Samy Bengio, Oriol Vinyals
We conclude with a discussion of the rapid learning vs feature reuse question for meta-learning algorithms more broadly.
1 code implementation • 28 Mar 2019 • Maithra Raghu, Katy Blumer, Greg Corrado, Jon Kleinberg, Ziad Obermeyer, Sendhil Mullainathan
In a wide array of areas, algorithms are matching and surpassing the performance of human experts, leading to consideration of the roles of human judgment and algorithmic prediction in these domains.
2 code implementations • NeurIPS 2019 • Maithra Raghu, Chiyuan Zhang, Jon Kleinberg, Samy Bengio
Investigating the learned representations and features, we find that some of the differences from transfer learning are due to the over-parametrization of standard models rather than sophisticated feature reuse.
no code implementations • 4 Jul 2018 • Maithra Raghu, Katy Blumer, Rory Sayres, Ziad Obermeyer, Robert Kleinberg, Sendhil Mullainathan, Jon Kleinberg
Our central methodological finding is that Direct Uncertainty Prediction (DUP), training a model to predict an uncertainty score directly from the raw patient features, works better than Uncertainty Via Classification, the two-step process of training a classifier and postprocessing the output distribution to give an uncertainty score.
2 code implementations • NeurIPS 2018 • Ari S. Morcos, Maithra Raghu, Samy Bengio
Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training.
2 code implementations • ICLR 2018 • Justin Gilmer, Luke Metz, Fartash Faghri, Samuel S. Schoenholz, Maithra Raghu, Martin Wattenberg, Ian Goodfellow
We hypothesize that this counter intuitive behavior is a naturally occurring result of the high dimensional geometry of the data manifold.
1 code implementation • ICML 2018 • Maithra Raghu, Alex Irpan, Jacob Andreas, Robert Kleinberg, Quoc V. Le, Jon Kleinberg
Deep reinforcement learning has achieved many recent successes, but our understanding of its strengths and limitations is hampered by the lack of rich environments in which we can fully characterize optimal behavior, and correspondingly diagnose individual actions against such a characterization.
3 code implementations • NeurIPS 2017 • Maithra Raghu, Justin Gilmer, Jason Yosinski, Jascha Sohl-Dickstein
We propose a new technique, Singular Vector Canonical Correlation Analysis (SVCCA), a tool for quickly comparing two representations in a way that is both invariant to affine transform (allowing comparison between different layers and networks) and fast to compute (allowing more comparisons to be calculated than with previous methods).
1 code implementation • 5 Apr 2017 • Ravi Kumar, Maithra Raghu, Tamas Sarlos, Andrew Tomkins
We introduce LAMP: the Linear Additive Markov Process.
no code implementations • 24 Nov 2016 • Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein
This quantity grows exponentially in the depth of the network, and is responsible for the depth sensitivity observed.
no code implementations • ICML 2017 • Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein
We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.
1 code implementation • NeurIPS 2016 • Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, Surya Ganguli
We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights.