Search Results for author: Andrew M. Saxe

Found 21 papers, 8 papers with code

Flexible task abstractions emerge in linear networks with fast and bounded units

1 code implementation6 Nov 2024 Kai Sandbrink, Jan P. Bauer, Alexandra M. Proca, Andrew M. Saxe, Christopher Summerfield, Ali Hummos

We observe that the weights self-organize into modules specialized for tasks or sub-tasks encountered, while the gates layer forms unique representations that switch the appropriate weight modules (task abstractions).

On The Specialization of Neural Modules

1 code implementation23 Sep 2024 Devon Jarvis, Richard Klein, Benjamin Rosman, Andrew M. Saxe

Our results shed light on the difficulty of module specialization, what is required for modules to successfully specialize, and the necessity of modular architectures to achieve systematicity.

Systematic Generalization

What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation

2 code implementations10 Apr 2024 Aaditya K. Singh, Ted Moskovitz, Felix Hill, Stephanie C. Y. Chan, Andrew M. Saxe

By clamping subsets of activations throughout training, we then identify three underlying subcircuits that interact to drive IH formation, yielding the phase change.

In-Context Learning

When Representations Align: Universality in Representation Learning Dynamics

no code implementations14 Feb 2024 Loek van Rossem, Andrew M. Saxe

We show through experiments that the effective theory describes aspects of representation learning dynamics across a range of deep networks with different activation functions and architectures, and exhibits phenomena similar to the "rich" and "lazy" regime.

Representation Learning

The Transient Nature of Emergent In-Context Learning in Transformers

2 code implementations NeurIPS 2023 Aaditya K. Singh, Stephanie C. Y. Chan, Ted Moskovitz, Erin Grant, Andrew M. Saxe, Felix Hill

The transient nature of ICL is observed in transformers across a range of model sizes and datasets, raising the question of how much to "overtrain" transformers when seeking compact, cheaper-to-run models.

Bayesian Inference In-Context Learning +1

Meta-Learning Strategies through Value Maximization in Neural Networks

no code implementations30 Oct 2023 Rodrigo Carrasco-Davis, Javier Masís, Andrew M. Saxe

Understanding how to make these meta-learning choices could offer normative accounts of cognitive control functions in biological learners and improve engineered systems.

Continual Learning Meta-Learning

Abrupt and spontaneous strategy switches emerge in simple regularised neural networks

no code implementations22 Feb 2023 Anika T. Löwe, Léo Touzo, Paul S. Muhle-Karbe, Andrew M. Saxe, Christopher Summerfield, Nicolas W. Schuck

Humans sometimes have an insight that leads to a sudden and drastic performance improvement on the task they are working on.

The Neural Race Reduction: Dynamics of Abstraction in Gated Networks

no code implementations21 Jul 2022 Andrew M. Saxe, Shagun Sodhani, Sam Lewallen

Our theoretical understanding of deep learning has not kept pace with its empirical success.

Generalisation dynamics of online learning in over-parameterised neural networks

no code implementations25 Jan 2019 Sebastian Goldt, Madhu S. Advani, Andrew M. Saxe, Florent Krzakala, Lenka Zdeborová

Deep neural networks achieve stellar generalisation on a variety of problems, despite often being large enough to easily fit all their training data.

A mathematical theory of semantic development in deep neural networks

1 code implementation23 Oct 2018 Andrew M. Saxe, James L. McClelland, Surya Ganguli

An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences?

Semantic Similarity Semantic Textual Similarity

Minnorm training: an algorithm for training over-parameterized deep neural networks

no code implementations3 Jun 2018 Yamini Bansal, Madhu Advani, David D. Cox, Andrew M. Saxe

To solve this constrained optimization problem, our method employs Lagrange multipliers that act as integrators of error over training and identify `support vector'-like examples.

Generalization Bounds

High-dimensional dynamics of generalization error in neural networks

no code implementations10 Oct 2017 Madhu S. Advani, Andrew M. Saxe

We study the practically-relevant "high-dimensional" regime where the number of free parameters in the network is on the order of or even larger than the number of examples in the dataset.

Vocal Bursts Intensity Prediction

Active Long Term Memory Networks

no code implementations7 Jun 2016 Tommaso Furlanello, Jiaping Zhao, Andrew M. Saxe, Laurent Itti, Bosco S. Tjan

Continual Learning in artificial neural networks suffers from interference and forgetting when different tasks are learned sequentially.

Continual Learning Domain Adaptation

Qualitatively characterizing neural network optimization problems

1 code implementation19 Dec 2014 Ian J. Goodfellow, Oriol Vinyals, Andrew M. Saxe

Training neural networks involves solving large-scale non-convex optimization problems.

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

2 code implementations20 Dec 2013 Andrew M. Saxe, James L. McClelland, Surya Ganguli

We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times.

Unsupervised Pre-training

Cannot find the paper you are looking for? You can Submit a new open access paper.