no code implementations • 16 Mar 2023 • Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos
Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time.
1 code implementation • 16 Nov 2022 • Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, Yuta Koreeda
We present Holistic Evaluation of Language Models (HELM) to improve the transparency of language models.
no code implementations • 15 Oct 2022 • Anthony Zador, Sean Escola, Blake Richards, Bence Ölveczky, Yoshua Bengio, Kwabena Boahen, Matthew Botvinick, Dmitri Chklovskii, Anne Churchland, Claudia Clopath, James DiCarlo, Surya Ganguli, Jeff Hawkins, Konrad Koerding, Alexei Koulakov, Yann Lecun, Timothy Lillicrap, Adam Marblestone, Bruno Olshausen, Alexandre Pouget, Cristina Savin, Terrence Sejnowski, Eero Simoncelli, Sara Solla, David Sussillo, Andreas S. Tolias, Doris Tsao
Neuroscience has long been an essential driver of progress in artificial intelligence (AI).
1 code implementation • 11 Oct 2022 • Stanislav Fort, Ekin Dogus Cubuk, Surya Ganguli, Samuel S. Schoenholz
Deep neural network classifiers partition input space into high confidence regions for each class.
no code implementations • 7 Oct 2022 • Daniel Kunin, Atsushi Yamamura, Chao Ma, Surya Ganguli
We introduce the class of quasi-homogeneous models, which is expressive enough to describe nearly all neural networks with homogeneous activations, even those with biases, residual connections, and normalization layers, while structured enough to enable geometric analysis of its gradient dynamics.
no code implementations • 6 Oct 2022 • Mansheej Paul, Feng Chen, Brett W. Larsen, Jonathan Frankle, Surya Ganguli, Gintare Karolina Dziugaite
Third, we show how the flatness of the error landscape at the end of training determines a limit on the fraction of weights that can be pruned at each iteration of IMP.
no code implementations • 30 Sep 2022 • James C. R. Whittington, Will Dorrell, Surya Ganguli, Timothy E. J. Behrens
Neurons in the brain are often finely tuned for specific task variables.
no code implementations • 29 Jun 2022 • Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, Ari S. Morcos
Widely observed neural scaling laws, in which error falls off as a power of the training set size, model size, or both, have driven substantial performance improvements in deep learning.
1 code implementation • 2 Jun 2022 • Mansheej Paul, Brett W. Larsen, Surya Ganguli, Jonathan Frankle, Gintare Karolina Dziugaite
A striking observation about iterative magnitude pruning (IMP; Frankle et al. 2020) is that $\unicode{x2014}$ after just a few hundred steps of dense training $\unicode{x2014}$ the method can find a sparse sub-network that can be trained to the same accuracy as the dense network.
1 code implementation • ICLR 2022 • Agrim Gupta, Linxi Fan, Surya Ganguli, Li Fei-Fei
Multiple domains like vision, natural language, and audio are witnessing tremendous progress by leveraging Transformers for large scale pre-training followed by task specific fine tuning.
1 code implementation • NeurIPS 2021 • Aran Nayebi, Alexander Attinger, Malcolm Campbell, Kiah Hardcastle, Isabel Low, Caitlin Mallory, Gabriel Mel, Ben Sorscher, Alex Williams, Surya Ganguli, Lisa Giocomo, Dan Yamins
Medial entorhinal cortex (MEC) supports a wide range of navigational and memory related behaviors. Well-known experimental results have revealed specialized cell types in MEC --- e. g. grid, border, and head-direction cells --- whose highly stereotypical response profiles are suggestive of the role they might play in supporting MEC functionality.
no code implementations • 29 Sep 2021 • Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel LK Yamins
In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD).
1 code implementation • 19 Jul 2021 • Daniel Kunin, Javier Sagastuy-Brena, Lauren Gillespie, Eshed Margalit, Hidenori Tanaka, Surya Ganguli, Daniel L. K. Yamins
In this work we explore the limiting dynamics of deep neural networks trained with stochastic gradient descent (SGD).
no code implementations • 18 Jul 2021 • Christopher H. Stock, Sarah E. Harvey, Samuel A. Ocko, Surya Ganguli
We introduce a novel, biologically plausible local learning rule that provably increases the robustness of neural dynamics to noise in nonlinear recurrent neural networks with homogeneous nonlinearities.
1 code implementation • NeurIPS 2021 • Mansheej Paul, Surya Ganguli, Gintare Karolina Dziugaite
Compared to recent work that prunes data by discarding examples that are rarely forgotten over the course of training, our scores use only local information early in training.
1 code implementation • ICLR 2022 • Brett W. Larsen, Stanislav Fort, Nic Becker, Surya Ganguli
In particular, we show via Gordon's escape theorem, that the training dimension plus the Gaussian width of the desired loss sub-level set, projected onto a unit sphere surrounding the initialization, must exceed the total number of parameters for the success probability to be large.
4 code implementations • 12 Feb 2021 • Yuandong Tian, Xinlei Chen, Surya Ganguli
While contrastive approaches of self-supervised learning (SSL) learn representations by minimizing the distance between two augmented views of the same data point (positive pairs) and maximizing views from different data points (negative pairs), recent \emph{non-contrastive} SSL (e. g., BYOL and SimSiam) show remarkable performance {\it without} negative pairs, with an extra learnable predictor and a stop-gradient operation.
1 code implementation • 3 Feb 2021 • Agrim Gupta, Silvio Savarese, Surya Ganguli, Li Fei-Fei
However, the principles governing relations between environmental complexity, evolved morphology, and the learnability of intelligent control, remain elusive, partially due to the substantial challenge of performing large-scale in silico experiments on evolution and learning.
no code implementations • 1 Jan 2021 • Stanislav Fort, Ekin Dogus Cubuk, Surya Ganguli, Samuel Stern Schoenholz
Deep neural network classifiers naturally partition input space into regions belonging to different classes.
no code implementations • ICLR 2021 • Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel LK Yamins, Hidenori Tanaka
Overall, by exploiting symmetry, our work demonstrates that we can analytically describe the learning dynamics of various parameter combinations at finite learning rates and batch sizes for state of the art architectures trained on any dataset.
1 code implementation • 8 Dec 2020 • Daniel Kunin, Javier Sagastuy-Brena, Surya Ganguli, Daniel L. K. Yamins, Hidenori Tanaka
Overall, by exploiting symmetry, our work demonstrates that we can analytically describe the learning dynamics of various parameter combinations at finite learning rates and batch sizes for state of the art architectures trained on any dataset.
no code implementations • NeurIPS 2020 • Stanislav Fort, Gintare Karolina Dziugaite, Mansheej Paul, Sepideh Kharaghani, Daniel M. Roy, Surya Ganguli
We study the relationship between the training dynamics of nonlinear deep networks, the geometry of the loss landscape, and the time evolution of a data-dependent NTK.
2 code implementations • NeurIPS 2020 • Aran Nayebi, Sanjana Srivastava, Surya Ganguli, Daniel L. K. Yamins
We show that different classes of learning rules can be separated solely on the basis of aggregate statistics of the weights, activations, or instantaneous layer-wise activity changes, and that these results generalize to limited access to the trajectory and held-out architectures and learning curricula.
2 code implementations • EMNLP 2020 • John Hewitt, Michael Hahn, Surya Ganguli, Percy Liang, Christopher D. Manning
Recurrent neural networks empirically generate natural language with high syntactic fidelity.
2 code implementations • 1 Oct 2020 • Yuandong Tian, Lantao Yu, Xinlei Chen, Surya Ganguli
We propose a novel theoretical framework to understand contrastive self-supervised learning (SSL) methods that employ dual pairs of deep ReLU networks (e. g., SimCLR).
no code implementations • NeurIPS 2020 • Jonathan Kadmon, Jonathan Timcheck, Surya Ganguli
However, the theoretical principles governing the efficacy of balanced predictive coding and its robustness to noise, synaptic weight heterogeneity and communication delays remain poorly understood.
2 code implementations • NeurIPS 2020 • Hidenori Tanaka, Daniel Kunin, Daniel L. K. Yamins, Surya Ganguli
Pruning the parameters of deep neural networks has generated intense interest due to potential savings in time, memory and energy both during training and at test time.
1 code implementation • ICML 2020 • Daniel Kunin, Aran Nayebi, Javier Sagastuy-Brena, Surya Ganguli, Jonathan M. Bloom, Daniel L. K. Yamins
The neural plausibility of backpropagation has long been disputed, primarily for its use of non-local weight transport $-$ the biologically dubious requirement that one neuron instantaneously measure the synaptic weights of another.
1 code implementation • NeurIPS 2019 • Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli
Thus overall, this work not only yields insights into the computational mechanisms underlying the striking predictive capabilities of the retina, but also places the framework of deep networks as neuroscientific models on firmer theoretical foundations, by providing a new roadmap to go beyond comparing neural representations to extracting and understand computational mechanisms.
1 code implementation • NeurIPS 2019 • Ben Sorscher, Gabriel Mel, Surya Ganguli, Samuel Ocko
This theory provides insight into the optimal solutions of diverse formulations of the normative task, and shows that symmetries in the representation of space correctly predict the structure of learned firing fields in trained neural networks.
no code implementations • 14 Oct 2019 • Stanislav Fort, Surya Ganguli
The local geometry of high dimensional neural network loss landscapes can both challenge our cherished theoretical intuitions as well as dramatically impact the practical success of neural network training.
no code implementations • NeurIPS Workshop Neuro_AI 2019 • Hidenori Tanaka, Aran Nayebi, Niru Maheswaranathan, Lane McIntosh, Stephen A. Baccus, Surya Ganguli
Thus overall, this work not only yields insights into the computational mechanisms underlying the striking predictive capabilities of the retina, but also places the framework of deep networks as neuroscientific models on firmer theoretical foundations, by providing a new roadmap to go beyond comparing neural representations to extracting and understand computational mechanisms.
no code implementations • NeurIPS 2019 • Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, David Sussillo
To address these foundational questions, we study populations of thousands of networks, with commonly used RNN architectures, trained to solve neuroscientifically motivated tasks and characterize their nonlinear dynamics.
no code implementations • 29 Jun 2019 • Anthony Degleris, Ben Antin, Surya Ganguli, Alex H. Williams
Identifying recurring patterns in high-dimensional time series data is an important problem in many scientific domains.
no code implementations • NeurIPS 2019 • Niru Maheswaranathan, Alex Williams, Matthew D. Golub, Surya Ganguli, David Sussillo
In this work, we use tools from dynamical systems analysis to reverse engineer recurrent networks trained to perform sentiment classification, a foundational natural language processing task.
no code implementations • ICML Workshop Deep_Phenomen 2019 • Niru Maheswaranathan, Alex H. Williams, Matthew D. Golub, Surya Ganguli, David Sussillo
Recurrent neural networks (RNNs) are a powerful tool for modeling sequential data.
no code implementations • ICLR 2019 • Jack Lindsey, Samuel A. Ocko, Surya Ganguli, Stephane Deny
Neural representations vary drastically across the first stages of visual processing.
1 code implementation • 3 Jan 2019 • Jack Lindsey, Samuel A. Ocko, Surya Ganguli, Stephane Deny
The visual system is hierarchically organized to process visual information in successive stages.
no code implementations • NeurIPS 2018 • Samuel Ocko, Jack Lindsey, Surya Ganguli, Stephane Deny
Also, we train a nonlinear encoding model with a rectifying nonlinearity to efficiently encode naturalistic movies, and again find emergent receptive fields resembling those of midget and parasol cells that are now further subdivided into ON and OFF types.
1 code implementation • 23 Oct 2018 • Andrew M. Saxe, James L. McClelland, Surya Ganguli
An extensive body of empirical research has revealed remarkable regularities in the acquisition, organization, deployment, and neural representation of human semantic knowledge, thereby raising a fundamental conceptual question: what are the theoretical principles governing the ability of neural networks to acquire, organize, and deploy abstract knowledge by integrating across many individual experiences?
1 code implementation • NeurIPS 2018 • Jonathan Kadmon, Surya Ganguli
Often, large, high dimensional datasets collected across multiple modalities can be organized as a higher order tensor.
no code implementations • ICLR 2019 • Andrew K. Lampinen, Surya Ganguli
However we lack analytic theories that can quantitatively predict how the degree of knowledge transfer depends on the relationship between the tasks.
1 code implementation • NeurIPS 2018 • Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J. DiCarlo, Daniel L. K. Yamins
Feed-forward convolutional neural networks (CNNs) are currently state-of-the-art for object classification tasks such as ImageNet.
1 code implementation • 27 Feb 2018 • Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli
Recent work has shown that tight concentration of the entire spectrum of singular values of a deep network's input-output Jacobian around one at initialization can speed up learning by orders of magnitude.
no code implementations • NeurIPS 2017 • Jeffrey Pennington, Samuel S. Schoenholz, Surya Ganguli
It is well known that the initialization of weights in deep neural networks can have a dramatic impact on learning speed.
1 code implementation • NeurIPS 2017 • Anirudh Goyal, Nan Rosemary Ke, Surya Ganguli, Yoshua Bengio
The energy function is then modified so the model and data distributions match, with no guarantee on the number of steps required for the Markov chain to converge.
no code implementations • 31 May 2017 • Friedemann Zenke, Surya Ganguli
In summary, our results open the door to obtaining a better scientific understanding of learning and computation in spiking neural networks by advancing our ability to train them to solve nonlinear problems involving transformations between different spatiotemporal spike-time patterns.
no code implementations • 27 Mar 2017 • Aran Nayebi, Surya Ganguli
Inspired by biophysical principles underlying nonlinear dendritic computation in neural circuits, we develop a scheme to train deep neural networks to make them robust to adversarial attacks.
4 code implementations • ICML 2017 • Friedemann Zenke, Ben Poole, Surya Ganguli
While deep learning has led to remarkable advances across diverse applications, it struggles in domains where the data distribution changes over the course of learning.
no code implementations • NeurIPS 2016 • Lane T. McIntosh, Niru Maheswaranathan, Aran Nayebi, Surya Ganguli, Stephen A. Baccus
Here we demonstrate that deep convolutional neural networks (CNNs) capture retinal responses to natural scenes nearly to within the variability of a cell's response, and are markedly more accurate than linear-nonlinear (LN) models and Generalized Linear Models (GLMs).
no code implementations • 24 Nov 2016 • Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein
This quantity grows exponentially in the depth of the network, and is responsible for the depth sensitivity observed.
1 code implementation • 4 Nov 2016 • Samuel S. Schoenholz, Justin Gilmer, Surya Ganguli, Jascha Sohl-Dickstein
We show the existence of depth scales that naturally limit the maximum depth of signal propagation through these random networks.
no code implementations • NeurIPS 2016 • Madhu Advani, Surya Ganguli
In this work we demonstrate, when the signal distribution and the likelihood function associated with the noise are both log-concave, that optimal MMSE performance is asymptotically achievable via another M-estimation procedure.
no code implementations • 14 Jul 2016 • Subhaneil Lahiri, Peiran Gao, Surya Ganguli
Moreover, unlike previous work, we test our theoretical bounds against numerical experiments on the actual geometric distortions that typically occur for random projections of random smooth manifolds.
no code implementations • ICML 2017 • Maithra Raghu, Ben Poole, Jon Kleinberg, Surya Ganguli, Jascha Sohl-Dickstein
We propose a new approach to the problem of neural network expressivity, which seeks to characterize how structural properties of a neural network family affect the functions it is able to compute.
1 code implementation • NeurIPS 2016 • Ben Poole, Subhaneil Lahiri, Maithra Raghu, Jascha Sohl-Dickstein, Surya Ganguli
We combine Riemannian geometry with the mean field theory of high dimensional chaos to study the nature of signal propagation in generic, deep neural networks with random weights.
no code implementations • 24 Mar 2016 • Subhaneil Lahiri, Jascha Sohl-Dickstein, Surya Ganguli
Maximizing the speed and precision of communication while minimizing power dissipation is a fundamental engineering design goal.
no code implementations • 18 Jan 2016 • Madhu Advani, Surya Ganguli
Our analysis uncovers fundamental limits on the accuracy of inference in high dimensions, and reveals that widely cherished inference algorithms like maximum likelihood (ML) and maximum-a posteriori (MAP) inference cannot achieve these limits.
6 code implementations • NeurIPS 2015 • Chris Piech, Jonathan Spencer, Jonathan Huang, Surya Ganguli, Mehran Sahami, Leonidas Guibas, Jascha Sohl-Dickstein
Knowledge tracing---where a machine models the knowledge of a student as they interact with coursework---is a well established problem in computer supported education.
Ranked #1 on
Knowledge Tracing
on Assistments
5 code implementations • 12 Mar 2015 • Jascha Sohl-Dickstein, Eric A. Weiss, Niru Maheswaranathan, Surya Ganguli
A central problem in machine learning involves modeling complex data-sets using highly flexible families of probability distributions in which learning, sampling, inference, and evaluation are still analytically or computationally tractable.
4 code implementations • NeurIPS 2014 • Yann Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, Yoshua Bengio
Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.
no code implementations • 6 Jun 2014 • Ben Poole, Jascha Sohl-Dickstein, Surya Ganguli
Autoencoders have emerged as a useful framework for unsupervised learning of internal representations, and a wide variety of apparently conceptually disparate regularization techniques have been proposed to generate useful features.
no code implementations • 19 May 2014 • Razvan Pascanu, Yann N. Dauphin, Surya Ganguli, Yoshua Bengio
Gradient descent or quasi-Newton methods are almost ubiquitously used to perform such minimizations, and it is often thought that a main source of difficulty for the ability of these local methods to find the global minimum is the proliferation of local minima with much higher error than the global minimum.
3 code implementations • 20 Dec 2013 • Andrew M. Saxe, James L. McClelland, Surya Ganguli
We further exhibit a new class of random orthogonal initial conditions on weights that, like unsupervised pre-training, enjoys depth independent learning times.
no code implementations • NeurIPS 2013 • Subhaneil Lahiri, Surya Ganguli
An incredible gulf separates theoretical models of synapses, often described solely by a single scalar value denoting the size of a postsynaptic potential, from the immense complexity of molecular signaling pathways underlying real synapses.
1 code implementation • 9 Nov 2013 • Jascha Sohl-Dickstein, Ben Poole, Surya Ganguli
This algorithm contrasts with earlier stochastic second order techniques that treat the Hessian of each contributing function as a noisy approximation to the full Hessian, rather than as a target for direct estimation.
no code implementations • NeurIPS 2010 • Surya Ganguli, Haim Sompolinsky
Prior work, in the case of gaussian input sequences and linear neuronal networks, shows that the duration of memory traces in a network cannot exceed the number of neurons (in units of the neuronal time constant), and that no network can out-perform an equivalent feedforward network.