Search Results for author: Hidenori Tanaka

Found 18 papers, 9 papers with code

Towards an Understanding of Stepwise Inference in Transformers: A Synthetic Graph Navigation Model

no code implementations • 12 Feb 2024 • Mikail Khona, Maya Okawa, Jan Hula, Rahul Ramesh, Kento Nishi, Robert Dick, Ekdeep Singh Lubana, Hidenori Tanaka

Stepwise inference protocols, such as scratchpads and chain-of-thought, help language models solve complex problems by decomposing them into a sequence of simpler subproblems.

Paper
Add Code

Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks

no code implementations • 21 Nov 2023 • Rahul Ramesh, Ekdeep Singh Lubana, Mikail Khona, Robert P. Dick, Hidenori Tanaka

Transformers trained on huge text corpora exhibit a remarkable set of capabilities, e. g., performing basic arithmetic.

Paper
Add Code

Mechanistically analyzing the effects of fine-tuning on procedurally defined tasks

no code implementations • 21 Nov 2023 • Samyak Jain, Robert Kirk, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Edward Grefenstette, Tim Rocktäschel, David Scott Krueger

Fine-tuning large pre-trained models has become the de facto strategy for developing both task-specific and general-purpose machine learning systems, including developing models that are safe to deploy.

Network Pruning

Paper
Add Code

CORNN: Convex optimization of recurrent neural networks for rapid inference of neural dynamics

1 code implementation • NeurIPS 2023 • Fatih Dinc, Adam Shai, Mark Schnitzer, Hidenori Tanaka

Advances in optical and electrophysiological recording technologies have made it possible to record the dynamics of thousands of neurons, opening up new possibilities for interpreting and controlling large neural populations in behaving animals.

Paper
Code

In-Context Learning Dynamics with Random Binary Sequences

1 code implementation • 26 Oct 2023 • Eric J. Bigelow, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka, Tomer D. Ullman

Large language models (LLMs) trained on huge corpora of text datasets demonstrate intriguing capabilities, achieving state-of-the-art performance on tasks they were not explicitly trained for.

In-Context Learning

Paper
Code

Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task

1 code implementation • NeurIPS 2023 • Maya Okawa, Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka

Motivated by this, we perform a controlled study for understanding compositional generalization in conditional diffusion models in a synthetic setting, varying different attributes of the training data and measuring the model's ability to generate samples out-of-distribution.

Paper
Code

Mechanistic Mode Connectivity

1 code implementation • 15 Nov 2022 • Ekdeep Singh Lubana, Eric J. Bigelow, Robert P. Dick, David Krueger, Hidenori Tanaka

We study neural network loss landscapes through the lens of mode connectivity, the observation that minimizers of neural networks retrieved via training on a dataset are connected via simple paths of low loss.

Paper
Code

What shapes the loss landscape of self-supervised learning?

no code implementations • 2 Oct 2022 • Liu Ziyin, Ekdeep Singh Lubana, Masahito Ueda, Hidenori Tanaka

Prevention of complete and dimensional collapse of representations has recently become a design principle for self-supervised learning (SSL).

Self-Supervised Learning

Paper
Add Code

Noether’s Learning Dynamics: Role of Symmetry Breaking in Neural Networks

no code implementations • NeurIPS 2021 • Hidenori Tanaka, Daniel Kunin

In nature, symmetry governs regularities, while symmetry breaking brings texture.

Paper
Add Code

Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations, and anomalous diffusion

no code implementations • 29 Sep 2021 • Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel LK Yamins

In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD).

Paper
Add Code

The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

1 code implementation • 19 Jul 2021 • Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel L. K. Yamins

In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD).

Paper
Code

Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning

1 code implementation • NeurIPS 2021 • Ekdeep Singh Lubana, Robert P. Dick, Hidenori Tanaka

Inspired by BatchNorm, there has been an explosion of normalization layers in deep learning.

Paper
Code

Noether's Learning Dynamics: Role of Symmetry Breaking in Neural Networks

no code implementations • 6 May 2021 • Hidenori Tanaka, Daniel Kunin

In nature, symmetry governs regularities, while symmetry breaking brings texture.

Paper
Add Code

Symmetry, Conservation Laws, and Learning Dynamics in Neural Networks

no code implementations • ICLR 2021 • Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel LK Yamins, Hidenori Tanaka

Overall, by exploiting symmetry, our work demonstrates that we can analytically describe the learning dynamics of various parameter combinations at finite learning rates and batch sizes for state of the art architectures trained on any dataset.

Paper
Add Code

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

1 code implementation • 8 Dec 2020 • Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel L. K. Yamins, Hidenori Tanaka

Paper
Code

Pruning neural networks without any data by iteratively conserving synaptic flow

5 code implementations • NeurIPS 2020 • Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli

Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time.

210

Paper
Code

From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction

1 code implementation • NeurIPS 2019 • Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli

Thus overall, this work not only yields insights into the computational mechanisms underlying the striking predictive capabilities of the retina, but also places the framework of deep networks as neuroscientific models on firmer theoretical foundations, by providing a new roadmap to go beyond comparing neural representations to extracting and understand computational mechanisms.

Dimensionality Reduction

Paper
Code

Revealing computational mechanisms of retinal prediction via model reduction

no code implementations • NeurIPS Workshop Neuro_AI 2019 • Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli

Dimensionality Reduction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.