1 code implementation • 6 Nov 2024 • Kai Sandbrink, Jan P. Bauer, Alexandra M. Proca, Andrew M. Saxe, Christopher Summerfield, Ali Hummos
We observe that the weights self-organize into modules specialized for tasks or sub-tasks encountered, while the gates layer forms unique representations that switch the appropriate weight modules (task abstractions).
1 code implementation • 23 Sep 2024 • Devon Jarvis, Richard Klein, Benjamin Rosman, Andrew M. Saxe
Our results shed light on the difficulty of module specialization, what is required for modules to successfully specialize, and the necessity of modular architectures to achieve systematicity.
no code implementations • 22 Sep 2024 • Clémentine C. J. Dominé, Nicolas Anguita, Alexandra M. Proca, Lukas Braun, Daniel Kunin, Pedro A. M. Mediano, Andrew M. Saxe
Biological and artificial neural networks develop internal representations that enable them to perform complex tasks.
2 code implementations • 10 Apr 2024 • Aaditya K. Singh, Ted Moskovitz, Felix Hill, Stephanie C. Y. Chan, Andrew M. Saxe
By clamping subsets of activations throughout training, we then identify three underlying subcircuits that interact to drive IH formation, yielding the phase change.
no code implementations • 14 Feb 2024 • Loek van Rossem, Andrew M. Saxe
We show through experiments that the effective theory describes aspects of representation learning dynamics across a range of deep networks with different activation functions and architectures, and exhibits phenomena similar to the "rich" and "lazy" regime.
2 code implementations • NeurIPS 2023 • Aaditya K. Singh, Stephanie C. Y. Chan, Ted Moskovitz, Erin Grant, Andrew M. Saxe, Felix Hill
The transient nature of ICL is observed in transformers across a range of model sizes and datasets, raising the question of how much to "overtrain" transformers when seeking compact, cheaper-to-run models.
no code implementations • 30 Oct 2023 • Rodrigo Carrasco-Davis, Javier Masís, Andrew M. Saxe
Understanding how to make these meta-learning choices could offer normative accounts of cognitive control functions in biological learners and improve engineered systems.
no code implementations • 22 Feb 2023 • Anika T. Löwe, Léo Touzo, Paul S. Muhle-Karbe, Andrew M. Saxe, Christopher Summerfield, Nicolas W. Schuck
Humans sometimes have an insight that leads to a sudden and drastic performance improvement on the task they are working on.
no code implementations • 21 Jul 2022 • Andrew M. Saxe, Shagun Sodhani, Sam Lewallen
Our theoretical understanding of deep learning has not kept pace with its empirical success.
3 code implementations • NeurIPS 2019 • Sebastian Goldt, Madhu S. Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová
Deep neural networks achieve stellar generalisation even when they have enough parameters to easily fit all their training data.
no code implementations • 25 Jan 2019 • Sebastian Goldt, Madhu S. Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová
Deep neural networks achieve stellar generalisation on a variety of problems, despite often being large enough to easily fit all their training data.
1 code implementation • 23 Oct 2018 • Andrew M. Saxe, James L. McClelland, Surya Ganguli
An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences?
no code implementations • 3 Jun 2018 • Yamini Bansal, Madhu Advani, David D. Cox, Andrew M. Saxe
To solve this constrained optimization problem, our method employs Lagrange multipliers that act as integrators of error over training and identify `support vector'-like examples.
no code implementations • 5 Mar 2018 • Yao Zhang, Andrew M. Saxe, Madhu S. Advani, Alpha A. Lee
We derive a correspondence between parameter inference and free energy minimisation in statistical physics.
no code implementations • 10 Oct 2017 • Madhu S. Advani, Andrew M. Saxe
We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset.
no code implementations • ICML 2017 • Andrew M. Saxe, Adam C. Earle, Benjamin Rosman
Hierarchical architectures are critical to the scalability of reinforcement learning methods.
no code implementations • ICLR 2018 • Adam C. Earle, Andrew M. Saxe, Benjamin Rosman
Hierarchical reinforcement learning methods offer a powerful means of planning flexible behavior in complicated domains.
no code implementations • 8 Dec 2016 • Andrew M. Saxe, Adam Earle, Benjamin Rosman
Hierarchical architectures are critical to the scalability of reinforcement learning methods.
no code implementations • 7 Jun 2016 • Tommaso Furlanello, Jiaping Zhao, Andrew M. Saxe, Laurent Itti, Bosco S. Tjan
Continual Learning in artificial neural networks suffers from interference and forgetting when different tasks are learned sequentially.
1 code implementation • 19 Dec 2014 • Ian J. Goodfellow, Oriol Vinyals, Andrew M. Saxe
Training neural networks involves solving large-scale non-convex optimization problems.
2 code implementations • 20 Dec 2013 • Andrew M. Saxe, James L. McClelland, Surya Ganguli
We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times.