Search Results for author: Niru Maheswaranathan

Found 26 papers, 10 papers with code

Understanding How Encoder-Decoder Architectures Attend

no code implementations NeurIPS 2021 Kyle Aitken, Vinay V Ramasesh, Yuan Cao, Niru Maheswaranathan

Moreover, how these mechanisms vary depending on the particular architecture used for the encoder and decoder (recurrent, feed-forward, etc.)

Training Learned Optimizers with Randomly Initialized Learned Optimizers

no code implementations14 Jan 2021 Luke Metz, C. Daniel Freeman, Niru Maheswaranathan, Jascha Sohl-Dickstein

We show that a population of randomly initialized learned optimizers can be used to train themselves from scratch in an online fashion, without resorting to a hand designed optimizer in any part of the process.

Overcoming barriers to the training of effective learned optimizers

no code implementations1 Jan 2021 Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters.

The geometry of integration in text classification RNNs

1 code implementation ICLR 2021 Kyle Aitken, Vinay V. Ramasesh, Ankush Garg, Yuan Cao, David Sussillo, Niru Maheswaranathan

Using tools from dynamical systems analysis, we study recurrent networks trained on a battery of both natural and synthetic text classification tasks.

Classification General Classification +1

Tasks, stability, architecture, and compute: Training more effective learned optimizers, and using them to train themselves

no code implementations23 Sep 2020 Luke Metz, Niru Maheswaranathan, C. Daniel Freeman, Ben Poole, Jascha Sohl-Dickstein

In this work we focus on general-purpose learned optimizers capable of training a wide variety of problems with no user-specified hyperparameters.

How recurrent networks implement contextual processing in sentiment analysis

1 code implementation ICML 2020 Niru Maheswaranathan, David Sussillo

Here, we propose general methods for reverse engineering recurrent neural networks (RNNs) to identify and elucidate contextual processing.

Sentiment Analysis

From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction

1 code implementation NeurIPS 2019 Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli

Thus overall, this work not only yields insights into the computational mechanisms underlying the striking predictive capabilities of the retina, but also places the framework of deep networks as neuroscientific models on firmer theoretical foundations, by providing a new roadmap to go beyond comparing neural representations to extracting and understand computational mechanisms.

Dimensionality Reduction

Revealing computational mechanisms of retinal prediction via model reduction

no code implementations NeurIPS Workshop Neuro_AI 2019 Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli

Thus overall, this work not only yields insights into the computational mechanisms underlying the striking predictive capabilities of the retina, but also places the framework of deep networks as neuroscientific models on firmer theoretical foundations, by providing a new roadmap to go beyond comparing neural representations to extracting and understand computational mechanisms.

Dimensionality Reduction

Universality and individuality in neural dynamics across large populations of recurrent networks

1 code implementation NeurIPS 2019 Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

To address these foundational questions, we study populations of thousands of networks, with commonly used RNN architectures, trained to solve neuroscientifically motivated tasks and characterize their nonlinear dynamics.

Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

no code implementations NeurIPS 2019 Niru Maheswaranathan, Alex Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task.

General Classification Sentiment Analysis

Using learned optimizers to make models robust to input noise

no code implementations8 Jun 2019 Luke Metz, Niru Maheswaranathan, Jonathon Shlens, Jascha Sohl-Dickstein, Ekin D. Cubuk

State-of-the art vision models can achieve superhuman performance on image classification tasks when testing and training data come from the same distribution.

General Classification Image Classification +1

Learning Unsupervised Learning Rules

no code implementations ICLR 2019 Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

Here, our desired task (meta-objective) is the performance of the representation on semi-supervised classification, and we meta-learn an algorithm -- an unsupervised weight update rule -- that produces representations that perform well under this meta-objective.

Meta-Learning

Learned optimizers that outperform on wall-clock and validation loss

no code implementations ICLR 2019 Luke Metz, Niru Maheswaranathan, Jeremy Nixon, Daniel Freeman, Jascha Sohl-Dickstein

We demonstrate these results on problems where our learned optimizer trains convolutional networks in a fifth of the wall-clock time compared to tuned first-order methods, and with an improvement

Guided Evolutionary Strategies: Escaping the curse of dimensionality in random search

no code implementations ICLR 2019 Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein

This arises when an approximate gradient is easier to compute than the full gradient (e. g. in meta-learning or unrolled optimization), or when a true gradient is intractable and is replaced with a surrogate (e. g. in certain reinforcement learning applications or training networks with discrete variables).

Meta-Learning

Understanding and correcting pathologies in the training of learned optimizers

1 code implementation24 Oct 2018 Luke Metz, Niru Maheswaranathan, Jeremy Nixon, C. Daniel Freeman, Jascha Sohl-Dickstein

Deep learning has shown that learned functions can dramatically outperform hand-designed functions on perceptual tasks.

Guided evolutionary strategies: Augmenting random search with surrogate gradients

1 code implementation ICLR 2019 Niru Maheswaranathan, Luke Metz, George Tucker, Dami Choi, Jascha Sohl-Dickstein

We propose Guided Evolutionary Strategies, a method for optimally using surrogate gradient directions along with random search.

Meta-Learning

Meta-Learning Update Rules for Unsupervised Representation Learning

2 code implementations ICLR 2019 Luke Metz, Niru Maheswaranathan, Brian Cheung, Jascha Sohl-Dickstein

Specifically, we target semi-supervised classification performance, and we meta-learn an algorithm -- an unsupervised weight update rule -- that produces representations useful for this task.

Meta-Learning Representation Learning

Recurrent Segmentation for Variable Computational Budgets

no code implementations28 Nov 2017 Lane McIntosh, Niru Maheswaranathan, David Sussillo, Jonathon Shlens

Importantly, the RNN may be deployed across a range of computational budgets by merely running the model for a variable number of iterations.

Semantic Segmentation Video Segmentation +1

Learned Optimizers that Scale and Generalize

1 code implementation ICML 2017 Olga Wichrowska, Niru Maheswaranathan, Matthew W. Hoffman, Sergio Gomez Colmenarejo, Misha Denil, Nando de Freitas, Jascha Sohl-Dickstein

Two of the primary barriers to its adoption are an inability to scale to larger problems and a limited ability to generalize to new tasks.

Deep Learning Models of the Retinal Response to Natural Scenes

no code implementations NeurIPS 2016 Lane T. McIntosh, Niru Maheswaranathan, Aran Nayebi, Surya Ganguli, Stephen A. Baccus

Here we demonstrate that deep convolutional neural networks (CNNs) capture retinal responses to natural scenes nearly to within the variability of a cell's response, and are markedly more accurate than linear-nonlinear (LN) models and Generalized Linear Models (GLMs).

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

4 code implementations12 Mar 2015 Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli

A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable.

Cannot find the paper you are looking for? You can Submit a new open access paper.