Search Results for author: Surya Ganguli

Found 69 papers, 36 papers with code

Short-term memory in neuronal networks through dynamical compressed sensing

no code implementations NeurIPS 2010 Surya Ganguli, Haim Sompolinsky

Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network.

Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods

1 code implementation9 Nov 2013 Jascha Sohl-Dickstein, Ben Poole, Surya Ganguli

This algorithm contrasts with earlier stochastic second order techniques that treat the Hessian of each contributing function as a noisy approximation to the full Hessian, rather than as a target for direct estimation.

Computational Efficiency

A memory frontier for complex synapses

no code implementations NeurIPS 2013 Subhaneil Lahiri, Surya Ganguli

An incredible gulf separates theoretical models of synapses, often described solely by a single scalar value denoting the size of a postsynaptic potential, from the immense complexity of molecular signaling pathways underlying real synapses.

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

3 code implementations20 Dec 2013 Andrew M. Saxe, James L. McClelland, Surya Ganguli

We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times.

Unsupervised Pre-training

On the saddle point problem for non-convex optimization

no code implementations19 May 2014 Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Analyzing noise in autoencoders and deep networks

no code implementations6 Jun 2014 Ben Poole, Jascha Sohl-Dickstein, Surya Ganguli

Autoencoders have emerged as a useful framework for unsupervised learning of internal representations, and a wide variety of apparently conceptually disparate regularization techniques have been proposed to generate useful features.

Denoising

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

4 code implementations NeurIPS 2014 Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

6 code implementations12 Mar 2015 Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli

A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable.

Deep Knowledge Tracing

6 code implementations NeurIPS 2015 Chris Piech, Jonathan Spencer, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas Guibas, Jascha Sohl-Dickstein

Knowledge tracing---where a machine models the knowledge of a student as they interact with coursework---is a well established problem in computer supported education.

Knowledge Tracing

Statistical Mechanics of High-Dimensional Inference

no code implementations18 Jan 2016 Madhu Advani, Surya Ganguli

Our analysis uncovers fundamental limits on the accuracy of inference in high dimensions, and reveals that widely cherished inference algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference cannot achieve these limits.

Bayesian Inference Vocal Bursts Intensity Prediction

A universal tradeoff between power, precision and speed in physical communication

no code implementations24 Mar 2016 Subhaneil Lahiri, Jascha Sohl-Dickstein, Surya Ganguli

Maximizing the speed and precision of communication while minimizing power dissipation is a fundamental engineering design goal.

Friction

On the Expressive Power of Deep Neural Networks

no code implementations ICML 2017 Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.

Exponential expressivity in deep neural networks through transient chaos

1 code implementation NeurIPS 2016 Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, Surya Ganguli

We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights.

Random projections of random manifolds

no code implementations14 Jul 2016 Subhaneil Lahiri, Peiran Gao, Surya Ganguli

Moreover, unlike previous work, we test our theoretical bounds against numerical experiments on the actual geometric distortions that typically occur for random projections of random smooth manifolds.

Dimensionality Reduction

An equivalence between high dimensional Bayes optimal inference and M-estimation

no code implementations NeurIPS 2016 Madhu Advani, Surya Ganguli

In this work we demonstrate, when the signal distribution and the likelihood function associated with the noise are both log-concave, that optimal MMSE performance is asymptotically achievable via another M-estimation procedure.

Vocal Bursts Intensity Prediction

Deep Information Propagation

1 code implementation4 Nov 2016 Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, Jascha Sohl-Dickstein

We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks.

Survey of Expressivity in Deep Neural Networks

no code implementations24 Nov 2016 Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

This quantity grows exponentially in the depth of the network, and is responsible for the depth sensitivity observed.

Deep Learning Models of the Retinal Response to Natural Scenes

no code implementations NeurIPS 2016 Lane T. McIntosh, Niru Maheswaranathan, Aran Nayebi, Surya Ganguli, Stephen A. Baccus

Here we demonstrate that deep convolutional neural networks (CNNs) capture retinal responses to natural scenes nearly to within the variability of a cell's response, and are markedly more accurate than linear-nonlinear (LN) models and Generalized Linear Models (GLMs).

Continual Learning Through Synaptic Intelligence

5 code implementations ICML 2017 Friedemann Zenke, Ben Poole, Surya Ganguli

While deep learning has led to remarkable advances across diverse applications, it struggles in domains where the data distribution changes over the course of learning.

Computational Efficiency Continual Learning +1

Biologically inspired protection of deep networks from adversarial attacks

no code implementations27 Mar 2017 Aran Nayebi, Surya Ganguli

Inspired by biophysical principles underlying nonlinear dendritic computation in neural circuits, we develop a scheme to train deep neural networks to make them robust to adversarial attacks.

Adversarial Attack Second-order methods

SuperSpike: Supervised learning in multi-layer spiking neural networks

1 code implementation31 May 2017 Friedemann Zenke, Surya Ganguli

In summary, our results open the door to obtaining a better scientific understanding of learning and computation in spiking neural networks by advancing our ability to train them to solve nonlinear problems involving transformations between different spatiotemporal spike-time patterns.

Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

1 code implementation NeurIPS 2017 Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio

The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge.

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

no code implementations NeurIPS 2017 Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli

It is well known that the initialization of weights in deep neural networks can have a dramatic impact on learning speed.

The Emergence of Spectral Universality in Deep Networks

1 code implementation27 Feb 2018 Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli

Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude.

Task-Driven Convolutional Recurrent Models of the Visual System

1 code implementation NeurIPS 2018 Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J. DiCarlo, Daniel L. K. Yamins

Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet.

General Classification Object Recognition

An analytic theory of generalization dynamics and transfer learning in deep linear networks

no code implementations ICLR 2019 Andrew K. Lampinen, Surya Ganguli

However we lack analytic theories that can quantitatively predict how the degree of knowledge transfer depends on the relationship between the tasks.

Multi-Task Learning

Statistical mechanics of low-rank tensor decomposition

1 code implementation NeurIPS 2018 Jonathan Kadmon, Surya Ganguli

Often, large, high dimensional datasets collected across multiple modalities can be organized as a higher order tensor.

Tensor Decomposition

A mathematical theory of semantic development in deep neural networks

1 code implementation23 Oct 2018 Andrew M. Saxe, James L. McClelland, Surya Ganguli

An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences?

Semantic Similarity Semantic Textual Similarity

The emergence of multiple retinal cell types through efficient coding of natural movies

no code implementations NeurIPS 2018 Samuel Ocko, Jack Lindsey, Surya Ganguli, Stephane Deny

Also, we train a nonlinear encoding model with a rectifying nonlinearity to efficiently encode naturalistic movies, and again find emergent receptive fields resembling those of midget and parasol cells that are now further subdivided into ON and OFF types.

Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

no code implementations NeurIPS 2019 Niru Maheswaranathan, Alex Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task.

General Classification Sentiment Analysis +1

Universality and individuality in neural dynamics across large populations of recurrent networks

no code implementations NeurIPS 2019 Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

To address these foundational questions, we study populations of thousands of networks, with commonly used RNN architectures, trained to solve neuroscientifically motivated tasks and characterize their nonlinear dynamics.

Revealing computational mechanisms of retinal prediction via model reduction

no code implementations NeurIPS Workshop Neuro_AI 2019 Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli

Thus overall, this work not only yields insights into the computational mechanisms underlying the striking predictive capabilities of the retina, but also places the framework of deep networks as neuroscientific models on firmer theoretical foundations, by providing a new roadmap to go beyond comparing neural representations to extracting and understand computational mechanisms.

Dimensionality Reduction

Emergent properties of the local geometry of neural loss landscapes

no code implementations14 Oct 2019 Stanislav Fort, Surya Ganguli

The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training.

A unified theory for the origin of grid cells through the lens of pattern formation

1 code implementation NeurIPS 2019 Ben Sorscher, Gabriel Mel, Surya Ganguli, Samuel Ocko

This theory provides insight into the optimal solutions of diverse formulations of the normative task, and shows that symmetries in the representation of space correctly predict the structure of learned firing fields in trained neural networks.

From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction

1 code implementation NeurIPS 2019 Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli

Thus overall, this work not only yields insights into the computational mechanisms underlying the striking predictive capabilities of the retina, but also places the framework of deep networks as neuroscientific models on firmer theoretical foundations, by providing a new roadmap to go beyond comparing neural representations to extracting and understand computational mechanisms.

Dimensionality Reduction

Two Routes to Scalable Credit Assignment without Weight Symmetry

1 code implementation ICML 2020 Daniel Kunin, Aran Nayebi, Javier Sagastuy-Brena, Surya Ganguli, Jonathan M. Bloom, Daniel L. K. Yamins

The neural plausibility of backpropagation has long been disputed, primarily for its use of non-local weight transport $-$ the biologically dubious requirement that one neuron instantaneously measure the synaptic weights of another.

Vocal Bursts Valence Prediction

Pruning neural networks without any data by iteratively conserving synaptic flow

5 code implementations NeurIPS 2020 Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli

Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time.

Predictive coding in balanced neural networks with noise, chaos and delays

no code implementations NeurIPS 2020 Jonathan Kadmon, Jonathan Timcheck, Surya Ganguli

However, the theoretical principles governing the efficacy of balanced predictive coding and its robustness to noise, synaptic weight heterogeneity and communication delays remain poorly understood.

Understanding Self-supervised Learning with Dual Deep Networks

2 code implementations1 Oct 2020 Yuandong Tian, Lantao Yu, Xinlei Chen, Surya Ganguli

We propose a novel theoretical framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks (e. g., SimCLR).

Self-Supervised Learning

Identifying Learning Rules From Neural Network Observables

2 code implementations NeurIPS 2020 Aran Nayebi, Sanjana Srivastava, Surya Ganguli, Daniel L. K. Yamins

We show that different classes of learning rules can be separated solely on the basis of aggregate statistics of the weights, activations, or instantaneous layer-wise activity changes, and that these results generalize to limited access to the trajectory and held-out architectures and learning curricula.

Open-Ended Question Answering

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

1 code implementation8 Dec 2020 Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel L. K. Yamins, Hidenori Tanaka

Overall, by exploiting symmetry, our work demonstrates that we can analytically describe the learning dynamics of various parameter combinations at finite learning rates and batch sizes for state of the art architectures trained on any dataset.

Slice, Dice, and Optimize: Measuring the Dimension of Neural Network Class Manifolds

no code implementations1 Jan 2021 Stanislav Fort, Ekin Dogus Cubuk, Surya Ganguli, Samuel Stern Schoenholz

Deep neural network classifiers naturally partition input space into regions belonging to different classes.

Symmetry, Conservation Laws, and Learning Dynamics in Neural Networks

no code implementations ICLR 2021 Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel LK Yamins, Hidenori Tanaka

Overall, by exploiting symmetry, our work demonstrates that we can analytically describe the learning dynamics of various parameter combinations at finite learning rates and batch sizes for state of the art architectures trained on any dataset.

Embodied Intelligence via Learning and Evolution

1 code implementation3 Feb 2021 Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei

However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, partially due to the substantial challenge of performing large-scale in silico experiments on evolution and learning.

Understanding self-supervised Learning Dynamics without Contrastive Pairs

5 code implementations12 Feb 2021 Yuandong Tian, Xinlei Chen, Surya Ganguli

While contrastive approaches of self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point (positive pairs) and maximizing views from different data points (negative pairs), recent \emph{non-contrastive} SSL (e. g., BYOL and SimSiam) show remarkable performance {\it without} negative pairs, with an extra learnable predictor and a stop-gradient operation.

Self-Supervised Learning

How many degrees of freedom do we need to train deep networks: a loss landscape perspective

1 code implementation ICLR 2022 Brett W. Larsen, Stanislav Fort, Nic Becker, Surya Ganguli

In particular, we show via Gordon's escape theorem, that the training dimension plus the Gaussian width of the desired loss sub-level set, projected onto a unit sphere surrounding the initialization, must exceed the total number of parameters for the success probability to be large.

Deep Learning on a Data Diet: Finding Important Examples Early in Training

1 code implementation NeurIPS 2021 Mansheej Paul, Surya Ganguli, Gintare Karolina Dziugaite

Compared to recent work that prunes data by discarding examples that are rarely forgotten over the course of training, our scores use only local information early in training.

Synaptic balancing: a biologically plausible local learning rule that provably increases neural network noise robustness without sacrificing task performance

no code implementations18 Jul 2021 Christopher H. Stock, Sarah E. Harvey, Samuel A. Ocko, Surya Ganguli

We introduce a novel, biologically plausible local learning rule that provably increases the robustness of neural dynamics to noise in nonlinear recurrent neural networks with homogeneous nonlinearities.

Explaining heterogeneity in medial entorhinal cortex with task-driven neural networks

1 code implementation NeurIPS 2021 Aran Nayebi, Alexander Attinger, Malcolm Campbell, Kiah Hardcastle, Isabel Low, Caitlin Mallory, Gabriel Mel, Ben Sorscher, Alex Williams, Surya Ganguli, Lisa Giocomo, Dan Yamins

Medial entorhinal cortex (MEC) supports a wide range of navigational and memory related behaviors. Well-known experimental results have revealed specialized cell types in MEC --- e. g. grid, border, and head-direction cells --- whose highly stereotypical response profiles are suggestive of the role they might play in supporting MEC functionality.

MetaMorph: Learning Universal Controllers with Transformers

2 code implementations ICLR 2022 Agrim Gupta, Linxi Fan, Surya Ganguli, Li Fei-Fei

Multiple domains like vision, natural language, and audio are witnessing tremendous progress by leveraging Transformers for large scale pre-training followed by task specific fine tuning.

Zero-shot Generalization

Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

1 code implementation2 Jun 2022 Mansheej Paul, Brett W. Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite

A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network.

Beyond neural scaling laws: beating power law scaling via data pruning

3 code implementations29 Jun 2022 Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos

Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning.

Benchmarking

Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

no code implementations6 Oct 2022 Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite

Third, we show how the flatness of the error landscape at the end of training determines a limit on the fraction of weights that can be pruned at each iteration of IMP.

The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

no code implementations7 Oct 2022 Daniel Kunin, Atsushi Yamamura, Chao Ma, Surya Ganguli

We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while structured enough to enable geometric analysis of its gradient dynamics.

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

1 code implementation16 Mar 2023 Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos

Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time.

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

1 code implementation NeurIPS 2023 Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli

In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization.

Geometric Dynamics of Signal Propagation Predict Trainability of Transformers

no code implementations5 Mar 2024 Aditya Cowsik, Tamra Nebabu, Xiao-Liang Qi, Surya Ganguli

Our update equations show that without MLP layers, this system will collapse to a line, consistent with prior work on rank collapse in transformers.

Cannot find the paper you are looking for? You can Submit a new open access paper.