Search Results for author: Surya Ganguli

Found 69 papers, 36 papers with code

Short-term memory in neuronal networks through dynamical compressed sensing

no code implementations • NeurIPS 2010 • Surya Ganguli, Haim Sompolinsky

Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network.

Paper
Add Code

Fast large-scale optimization by unifying stochastic gradient and quasi-Newton methods

1 code implementation • 9 Nov 2013 • Jascha Sohl-Dickstein, Ben Poole, Surya Ganguli

This algorithm contrasts with earlier stochastic second order techniques that treat the Hessian of each contributing function as a noisy approximation to the full Hessian, rather than as a target for direct estimation.

Computational Efficiency

126

Paper
Code

A memory frontier for complex synapses

no code implementations • NeurIPS 2013 • Subhaneil Lahiri, Surya Ganguli

An incredible gulf separates theoretical models of synapses, often described solely by a single scalar value denoting the size of a postsynaptic potential, from the immense complexity of molecular signaling pathways underlying real synapses.

Paper
Add Code

Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

3 code implementations • 20 Dec 2013 • Andrew M. Saxe, James L. McClelland, Surya Ganguli

We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times.

Unsupervised Pre-training

112

Paper
Code

On the saddle point problem for non-convex optimization

no code implementations • 19 May 2014 • Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Paper
Add Code

Analyzing noise in autoencoders and deep networks

no code implementations • 6 Jun 2014 • Ben Poole, Jascha Sohl-Dickstein, Surya Ganguli

Autoencoders have emerged as a useful framework for unsupervised learning of internal representations, and a wide variety of apparently conceptually disparate regularization techniques have been proposed to generate useful features.

Denoising

Paper
Add Code

Identifying and attacking the saddle point problem in high-dimensional non-convex optimization

4 code implementations • NeurIPS 2014 • Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio

Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.

Paper
Code

Deep Unsupervised Learning using Nonequilibrium Thermodynamics

6 code implementations • 12 Mar 2015 • Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli

A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable.

607

Paper
Code

Deep Knowledge Tracing

6 code implementations • NeurIPS 2015 • Chris Piech, Jonathan Spencer, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas Guibas, Jascha Sohl-Dickstein

Knowledge tracing---where a machine models the knowledge of a student as they interact with coursework---is a well established problem in computer supported education.

Ranked #1 on Knowledge Tracing on Assistments

Knowledge Tracing

251

Paper
Code

Statistical Mechanics of High-Dimensional Inference

no code implementations • 18 Jan 2016 • Madhu Advani, Surya Ganguli

Our analysis uncovers fundamental limits on the accuracy of inference in high dimensions, and reveals that widely cherished inference algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference cannot achieve these limits.

Bayesian Inference Vocal Bursts Intensity Prediction

Paper
Add Code

A universal tradeoff between power, precision and speed in physical communication

no code implementations • 24 Mar 2016 • Subhaneil Lahiri, Jascha Sohl-Dickstein, Surya Ganguli

Maximizing the speed and precision of communication while minimizing power dissipation is a fundamental engineering design goal.

Friction

Paper
Add Code

On the Expressive Power of Deep Neural Networks

no code implementations • ICML 2017 • Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.

Paper
Add Code

Exponential expressivity in deep neural networks through transient chaos

1 code implementation • NeurIPS 2016 • Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, Surya Ganguli

We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights.

Paper
Code

Random projections of random manifolds

no code implementations • 14 Jul 2016 • Subhaneil Lahiri, Peiran Gao, Surya Ganguli

Moreover, unlike previous work, we test our theoretical bounds against numerical experiments on the actual geometric distortions that typically occur for random projections of random smooth manifolds.

Dimensionality Reduction

Paper
Add Code

An equivalence between high dimensional Bayes optimal inference and M-estimation

no code implementations • NeurIPS 2016 • Madhu Advani, Surya Ganguli

In this work we demonstrate, when the signal distribution and the likelihood function associated with the noise are both log-concave, that optimal MMSE performance is asymptotically achievable via another M-estimation procedure.

Vocal Bursts Intensity Prediction

Paper
Add Code

Deep Information Propagation

1 code implementation • 4 Nov 2016 • Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, Jascha Sohl-Dickstein

We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks.

Paper
Code

Survey of Expressivity in Deep Neural Networks

no code implementations • 24 Nov 2016 • Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein

This quantity grows exponentially in the depth of the network, and is responsible for the depth sensitivity observed.

Paper
Add Code

Deep Learning Models of the Retinal Response to Natural Scenes

no code implementations • NeurIPS 2016 • Lane T. McIntosh, Niru Maheswaranathan, Aran Nayebi, Surya Ganguli, Stephen A. Baccus

Here we demonstrate that deep convolutional neural networks (CNNs) capture retinal responses to natural scenes nearly to within the variability of a cell's response, and are markedly more accurate than linear-nonlinear (LN) models and Generalized Linear Models (GLMs).

Paper
Add Code

Continual Learning Through Synaptic Intelligence

5 code implementations • ICML 2017 • Friedemann Zenke, Ben Poole, Surya Ganguli

While deep learning has led to remarkable advances across diverse applications, it struggles in domains where the data distribution changes over the course of learning.

Computational Efficiency Continual Learning +1

1,659

Paper
Code

Biologically inspired protection of deep networks from adversarial attacks

no code implementations • 27 Mar 2017 • Aran Nayebi, Surya Ganguli

Inspired by biophysical principles underlying nonlinear dendritic computation in neural circuits, we develop a scheme to train deep neural networks to make them robust to adversarial attacks.

Adversarial Attack Second-order methods

Paper
Add Code

SuperSpike: Supervised learning in multi-layer spiking neural networks

1 code implementation • 31 May 2017 • Friedemann Zenke, Surya Ganguli

In summary, our results open the door to obtaining a better scientific understanding of learning and computation in spiking neural networks by advancing our ability to train them to solve nonlinear problems involving transformations between different spatiotemporal spike-time patterns.

Paper
Code

Variational Walkback: Learning a Transition Operator as a Stochastic Recurrent Net

1 code implementation • NeurIPS 2017 • Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio

The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge.

Paper
Code

Resurrecting the sigmoid in deep learning through dynamical isometry: theory and practice

no code implementations • NeurIPS 2017 • Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli

It is well known that the initialization of weights in deep neural networks can have a dramatic impact on learning speed.

Paper
Add Code

The Emergence of Spectral Universality in Deep Networks

1 code implementation • 27 Feb 2018 • Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli

Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude.

Paper
Code

Task-Driven Convolutional Recurrent Models of the Visual System

1 code implementation • NeurIPS 2018 • Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J. DiCarlo, Daniel L. K. Yamins

Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet.

General Classification Object Recognition

Paper
Code

An analytic theory of generalization dynamics and transfer learning in deep linear networks

no code implementations • ICLR 2019 • Andrew K. Lampinen, Surya Ganguli

However we lack analytic theories that can quantitatively predict how the degree of knowledge transfer depends on the relationship between the tasks.

Multi-Task Learning

Paper
Add Code

Statistical mechanics of low-rank tensor decomposition

1 code implementation • NeurIPS 2018 • Jonathan Kadmon, Surya Ganguli

Often, large, high dimensional datasets collected across multiple modalities can be organized as a higher order tensor.

Tensor Decomposition

Paper
Code

A mathematical theory of semantic development in deep neural networks

1 code implementation • 23 Oct 2018 • Andrew M. Saxe, James L. McClelland, Surya Ganguli

An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences?

Paper
Code

The emergence of multiple retinal cell types through efficient coding of natural movies

no code implementations • NeurIPS 2018 • Samuel Ocko, Jack Lindsey, Surya Ganguli, Stephane Deny

Also, we train a nonlinear encoding model with a rectifying nonlinearity to efficiently encode naturalistic movies, and again find emergent receptive fields resembling those of midget and parasol cells that are now further subdivided into ON and OFF types.

Paper
Add Code

A Unified Theory of Early Visual Representations from Retina to Cortex through Anatomically Constrained Deep CNNs

1 code implementation • 3 Jan 2019 • Jack Lindsey, Samuel A. Ocko, Surya Ganguli, Stephane Deny

The visual system is hierarchically organized to process visual information in successive stages.

Anatomy

Paper
Code

The effects of neural resource constraints on early visual representations

no code implementations • ICLR 2019 • Jack Lindsey, Samuel A. Ocko, Surya Ganguli, Stephane Deny

Neural representations vary drastically across the first stages of visual processing.

Anatomy

Paper
Add Code

Line attractor dynamics in recurrent networks for sentiment classiﬁcation

no code implementations • ICML Workshop Deep_Phenomen 2019 • Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

Recurrent neural networks (RNNs) are a powerful tool for modeling sequential data.

Paper
Add Code

Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics

no code implementations • NeurIPS 2019 • Niru Maheswaranathan, Alex Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task.

General Classification Sentiment Analysis +1

Paper
Add Code

Fast Convolutive Nonnegative Matrix Factorization Through Coordinate and Block Coordinate Updates

no code implementations • 29 Jun 2019 • Anthony Degleris, Ben Antin, Surya Ganguli, Alex H. Williams

Identifying recurring patterns in high-dimensional time series data is an important problem in many scientific domains.

Time Series Time Series Analysis

Paper
Add Code

Universality and individuality in neural dynamics across large populations of recurrent networks

no code implementations • NeurIPS 2019 • Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, David Sussillo

To address these foundational questions, we study populations of thousands of networks, with commonly used RNN architectures, trained to solve neuroscientifically motivated tasks and characterize their nonlinear dynamics.

Paper
Add Code

Revealing computational mechanisms of retinal prediction via model reduction

no code implementations • NeurIPS Workshop Neuro_AI 2019 • Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli

Thus overall, this work not only yields insights into the computational mechanisms underlying the striking predictive capabilities of the retina, but also places the framework of deep networks as neuroscientific models on firmer theoretical foundations, by providing a new roadmap to go beyond comparing neural representations to extracting and understand computational mechanisms.

Dimensionality Reduction

Paper
Add Code

Emergent properties of the local geometry of neural loss landscapes

no code implementations • 14 Oct 2019 • Stanislav Fort, Surya Ganguli

The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training.

Paper
Add Code

A unified theory for the origin of grid cells through the lens of pattern formation

1 code implementation • NeurIPS 2019 • Ben Sorscher, Gabriel Mel, Surya Ganguli, Samuel Ocko

This theory provides insight into the optimal solutions of diverse formulations of the normative task, and shows that symmetries in the representation of space correctly predict the structure of learned firing fields in trained neural networks.

Paper
Code

From deep learning to mechanistic understanding in neuroscience: the structure of retinal prediction

1 code implementation • NeurIPS 2019 • Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli

Dimensionality Reduction

Paper
Code

Two Routes to Scalable Credit Assignment without Weight Symmetry

1 code implementation • ICML 2020 • Daniel Kunin, Aran Nayebi, Javier Sagastuy-Brena, Surya Ganguli, Jonathan M. Bloom, Daniel L. K. Yamins

The neural plausibility of backpropagation has long been disputed, primarily for its use of non-local weight transport $-$ the biologically dubious requirement that one neuron instantaneously measure the synaptic weights of another.

Vocal Bursts Valence Prediction

Paper
Code

Pruning neural networks without any data by iteratively conserving synaptic flow

5 code implementations • NeurIPS 2020 • Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli

Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time.

210

Paper
Code

Predictive coding in balanced neural networks with noise, chaos and delays

no code implementations • NeurIPS 2020 • Jonathan Kadmon, Jonathan Timcheck, Surya Ganguli

However, the theoretical principles governing the efficacy of balanced predictive coding and its robustness to noise, synaptic weight heterogeneity and communication delays remain poorly understood.

Paper
Add Code

Understanding Self-supervised Learning with Dual Deep Networks

2 code implementations • 1 Oct 2020 • Yuandong Tian, Lantao Yu, Xinlei Chen, Surya Ganguli

We propose a novel theoretical framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks (e. g., SimCLR).

Self-Supervised Learning

264

Paper
Code

RNNs can generate bounded hierarchical languages with optimal memory

2 code implementations • EMNLP 2020 • John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning

Recurrent neural networks empirically generate natural language with high syntactic fidelity.

Paper
Code

Identifying Learning Rules From Neural Network Observables

2 code implementations • NeurIPS 2020 • Aran Nayebi, Sanjana Srivastava, Surya Ganguli, Daniel L. K. Yamins

We show that different classes of learning rules can be separated solely on the basis of aggregate statistics of the weights, activations, or instantaneous layer-wise activity changes, and that these results generalize to limited access to the trajectory and held-out architectures and learning curricula.

Open-Ended Question Answering

Paper
Code

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the Neural Tangent Kernel

no code implementations • NeurIPS 2020 • Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli

We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK.

Paper
Add Code

Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics

1 code implementation • 8 Dec 2020 • Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel L. K. Yamins, Hidenori Tanaka

Overall, by exploiting symmetry, our work demonstrates that we can analytically describe the learning dynamics of various parameter combinations at finite learning rates and batch sizes for state of the art architectures trained on any dataset.

Paper
Code

Slice, Dice, and Optimize: Measuring the Dimension of Neural Network Class Manifolds

no code implementations • 1 Jan 2021 • Stanislav Fort, Ekin Dogus Cubuk, Surya Ganguli, Samuel Stern Schoenholz

Deep neural network classifiers naturally partition input space into regions belonging to different classes.

Paper
Add Code

Symmetry, Conservation Laws, and Learning Dynamics in Neural Networks

no code implementations • ICLR 2021 • Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel LK Yamins, Hidenori Tanaka

Paper
Add Code

Embodied Intelligence via Learning and Evolution

1 code implementation • 3 Feb 2021 • Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei

However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, partially due to the substantial challenge of performing large-scale in silico experiments on evolution and learning.

147

Paper
Code

Understanding self-supervised Learning Dynamics without Contrastive Pairs

5 code implementations • 12 Feb 2021 • Yuandong Tian, Xinlei Chen, Surya Ganguli

While contrastive approaches of self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point (positive pairs) and maximizing views from different data points (negative pairs), recent \emph{non-contrastive} SSL (e. g., BYOL and SimSiam) show remarkable performance {\it without} negative pairs, with an extra learnable predictor and a stop-gradient operation.

Self-Supervised Learning

1,684

Paper
Code

How many degrees of freedom do we need to train deep networks: a loss landscape perspective

1 code implementation • ICLR 2022 • Brett W. Larsen, Stanislav Fort, Nic Becker, Surya Ganguli

In particular, we show via Gordon's escape theorem, that the training dimension plus the Gaussian width of the desired loss sub-level set, projected onto a unit sphere surrounding the initialization, must exceed the total number of parameters for the success probability to be large.

Paper
Code

Deep Learning on a Data Diet: Finding Important Examples Early in Training

1 code implementation • NeurIPS 2021 • Mansheej Paul, Surya Ganguli, Gintare Karolina Dziugaite

Compared to recent work that prunes data by discarding examples that are rarely forgotten over the course of training, our scores use only local information early in training.

Paper
Code

Synaptic balancing: a biologically plausible local learning rule that provably increases neural network noise robustness without sacrificing task performance

no code implementations • 18 Jul 2021 • Christopher H. Stock, Sarah E. Harvey, Samuel A. Ocko, Surya Ganguli

We introduce a novel, biologically plausible local learning rule that provably increases the robustness of neural dynamics to noise in nonlinear recurrent neural networks with homogeneous nonlinearities.

Paper
Add Code

The Limiting Dynamics of SGD: Modified Loss, Phase Space Oscillations, and Anomalous Diffusion

1 code implementation • 19 Jul 2021 • Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel L. K. Yamins

In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD).

Paper
Code

Rethinking the limiting dynamics of SGD: modified loss, phase space oscillations, and anomalous diffusion

no code implementations • 29 Sep 2021 • Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel LK Yamins

In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD).

Paper
Add Code

Explaining heterogeneity in medial entorhinal cortex with task-driven neural networks

1 code implementation • NeurIPS 2021 • Aran Nayebi, Alexander Attinger, Malcolm Campbell, Kiah Hardcastle, Isabel Low, Caitlin Mallory, Gabriel Mel, Ben Sorscher, Alex Williams, Surya Ganguli, Lisa Giocomo, Dan Yamins

Medial entorhinal cortex (MEC) supports a wide range of navigational and memory related behaviors. Well-known experimental results have revealed specialized cell types in MEC --- e. g. grid, border, and head-direction cells --- whose highly stereotypical response profiles are suggestive of the role they might play in supporting MEC functionality.

Paper
Code

MetaMorph: Learning Universal Controllers with Transformers

2 code implementations • ICLR 2022 • Agrim Gupta, Linxi Fan, Surya Ganguli, Li Fei-Fei

Multiple domains like vision, natural language, and audio are witnessing tremendous progress by leveraging Transformers for large scale pre-training followed by task specific fine tuning.

Zero-shot Generalization

Paper
Code

Lottery Tickets on a Data Diet: Finding Initializations with Sparse Trainable Networks

1 code implementation • 2 Jun 2022 • Mansheej Paul, Brett W. Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite

A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network.

Paper
Code

Beyond neural scaling laws: beating power law scaling via data pruning

3 code implementations • 29 Jun 2022 • Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos

Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning.

Benchmarking

Paper
Code

Disentanglement with Biological Constraints: A Theory of Functional Cell Types

no code implementations • 30 Sep 2022 • James C. R. Whittington, Will Dorrell, Surya Ganguli, Timothy E. J. Behrens

Neurons in the brain are often finely tuned for specific task variables.

Disentanglement

Paper
Add Code

Unmasking the Lottery Ticket Hypothesis: What's Encoded in a Winning Ticket's Mask?

no code implementations • 6 Oct 2022 • Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite

Third, we show how the flatness of the error landscape at the end of training determines a limit on the fraction of weights that can be pruned at each iteration of IMP.

Paper
Add Code

The Asymmetric Maximum Margin Bias of Quasi-Homogeneous Neural Networks

no code implementations • 7 Oct 2022 • Daniel Kunin, Atsushi Yamamura, Chao Ma, Surya Ganguli

We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while structured enough to enable geometric analysis of its gradient dynamics.

Paper
Add Code

What does a deep neural network confidently perceive? The effective dimension of high certainty class manifolds and their low confidence boundaries

1 code implementation • 11 Oct 2022 • Stanislav Fort, Ekin Dogus Cubuk, Surya Ganguli, Samuel S. Schoenholz

Deep neural network classifiers partition input space into high confidence regions for each class.

Paper
Code

Toward Next-Generation Artificial Intelligence: Catalyzing the NeuroAI Revolution

no code implementations • 15 Oct 2022 • Anthony Zador, Sean Escola, Blake Richards, Bence Ölveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, Claudia Clopath, James DiCarlo, Surya Ganguli, Jeff Hawkins, Konrad Koerding, Alexei Koulakov, Yann Lecun, Timothy Lillicrap, Adam Marblestone, Bruno Olshausen, Alexandre Pouget, Cristina Savin, Terrence Sejnowski, Eero Simoncelli, Sara Solla, David Sussillo, Andreas S. Tolias, Doris Tsao

Neuroscience has long been an essential driver of progress in artificial intelligence (AI).

Paper
Add Code

Holistic Evaluation of Language Models

1 code implementation • 16 Nov 2022 • Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda

We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.

Fairness Question Answering

1,625

Paper
Code

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

1 code implementation • 16 Mar 2023 • Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos

Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time.

Paper
Code

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

1 code implementation • NeurIPS 2023 • Feng Chen, Daniel Kunin, Atsushi Yamamura, Surya Ganguli

In this work, we reveal a strong implicit bias of stochastic gradient descent (SGD) that drives overly expressive networks to much simpler subnetworks, thereby dramatically reducing the number of independent parameters, and improving generalization.

Paper
Code

Geometric Dynamics of Signal Propagation Predict Trainability of Transformers

no code implementations • 5 Mar 2024 • Aditya Cowsik, Tamra Nebabu, Xiao-Liang Qi, Surya Ganguli

Our update equations show that without MLP layers, this system will collapse to a line, consistent with prior work on rank collapse in transformers.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.