no code implementations • 26 Sep 2024 • Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
We develop a solvable model of neural scaling laws beyond the kernel limit.
no code implementations • 8 Aug 2024 • Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan
Recent years have seen substantial advances in our understanding of high-dimensional ridge regression, but existing theories assume that training examples are independent.
no code implementations • 7 Aug 2024 • Jacob A. Zavatone-Veth, Cengiz Pehlevan
We investigate the behavior of the Nadaraya-Watson kernel smoothing estimator in high dimensions using its relationship to the random energy model and to dense associative memories.
1 code implementation • 27 May 2024 • Sheng Yang, Jacob A. Zavatone-Veth, Cengiz Pehlevan
To this end, we propose a new spectral regularizer for representation learning that encourages black-box adversarial robustness in downstream classification tasks.
1 code implementation • 27 May 2024 • Sheng Yang, Peihan Liu, Cengiz Pehlevan
Hyperbolic spaces have increasingly been recognized for their outstanding performance in handling data with inherent hierarchical structures compared to their Euclidean counterparts.
1 code implementation • 24 May 2024 • William L. Tong, Cengiz Pehlevan
In-context learning (ICL), the remarkable ability to solve a task from only input exemplars, is often assumed to be a unique hallmark of Transformer models.
no code implementations • 24 May 2024 • Blake Bordelon, Hamza Tahir Chaudhry, Cengiz Pehlevan
In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime.
1 code implementation • 20 May 2024 • Yue M. Lu, Mary I. Letey, Jacob A. Zavatone-Veth, Anindita Maiti, Cengiz Pehlevan
Transformers have a remarkable ability to learn and execute tasks based on examples provided within the input itself, without explicit prior training.
1 code implementation • 1 May 2024 • Alexander Atanasov, Jacob A. Zavatone-Veth, Cengiz Pehlevan
This paper presents a succinct derivation of the training and generalization performance of a variety of high-dimensional ridge regression models using the basic tools of random matrix theory and free probability.
no code implementations • 2 Feb 2024 • Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude.
no code implementations • 9 Oct 2023 • Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan
We identify sufficient statistics for the test loss of such a network, and tracking these over training reveals that grokking arises in this setting when the network first attempts to fit a kernel regression solution with its initial features, followed by late-time feature learning where a generalizing solution is identified after train loss is already low.
no code implementations • 28 Sep 2023 • Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan
We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet.
1 code implementation • NeurIPS 2023 • Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan
We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function.
1 code implementation • NeurIPS 2023 • Benjamin S. Ruben, Cengiz Pehlevan
Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features.
1 code implementation • NeurIPS 2023 • Bariscan Bozkurt, Cengiz Pehlevan, Alper T Erdogan
Furthermore, our approach provides a natural resolution to the weight symmetry problem between forward and backward signal propagation paths, a significant critique against the plausibility of the conventional backpropagation algorithm.
1 code implementation • NeurIPS 2023 • Hamza Tahir Chaudhry, Jacob A. Zavatone-Veth, Dmitry Krotov, Cengiz Pehlevan
Sequence memory is an essential attribute of natural and artificial intelligence that enables agents to encode, store, and retrieve complex sequences of stimuli and actions.
no code implementations • NeurIPS 2023 • Nikhil Vyas, Alexander Atanasov, Blake Bordelon, Depen Morwani, Sabarish Sainathan, Cengiz Pehlevan
We call this the bias of narrower width.
1 code implementation • NeurIPS 2023 • Blake Bordelon, Cengiz Pehlevan
However, in the rich, feature learning regime, the fluctuations of the kernels and predictions are dynamically coupled with a variance that can be computed self-consistently.
1 code implementation • 26 Jan 2023 • Jacob A. Zavatone-Veth, Sheng Yang, Julian A. Rubinfien, Cengiz Pehlevan
This holds in deep networks trained on high-dimensional image classification tasks, and even in self-supervised representation learning.
1 code implementation • 23 Dec 2022 • Alexander Atanasov, Blake Bordelon, Sabarish Sainathan, Cengiz Pehlevan
For small training set sizes $P$, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network (NN), either in the kernel or mean-field/feature-learning regime.
1 code implementation • 9 Oct 2022 • Bariscan Bozkurt, Ates Isfendiyaroglu, Cengiz Pehlevan, Alper T. Erdogan
Here, we relax this limitation and propose a biologically plausible neural network that extracts correlated latent sources by exploiting information about their domains.
no code implementations • 5 Oct 2022 • Blake Bordelon, Cengiz Pehlevan
In the lazy limit, we find that DFA and Hebb can only learn using the last layer features, while full FA can utilize earlier layers with a scale determined by the initial correlation between feedforward and feedback weight matrices.
2 code implementations • 27 Sep 2022 • Bariscan Bozkurt, Cengiz Pehlevan, Alper T. Erdogan
Previous work on biologically-plausible BSS algorithms assumed that observed signals are linear mixtures of statistically independent or uncorrelated sources, limiting the domain of applicability of these algorithms.
no code implementations • 21 Sep 2022 • David Lipshutz, Cengiz Pehlevan, Dmitri B. Chklovskii
To this end, we consider two mathematically tractable recurrent linear neural networks that statistically whiten their inputs -- one with direct recurrent connections and the other with interneurons that mediate recurrent communication.
no code implementations • 14 Jun 2022 • Abdulkadir Canatar, Evan Peters, Cengiz Pehlevan, Stefan M. Wild, Ruslan Shaydulin
Quantum computers are known to provide speedups over classical state-of-the-art machine learning methods in some specialized settings.
no code implementations • 19 May 2022 • Blake Bordelon, Cengiz Pehlevan
We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory.
no code implementations • 1 Mar 2022 • Jacob A. Zavatone-Veth, William L. Tong, Cengiz Pehlevan
Moreover, we show that the leading-order correction to the kernel-limit learning curve cannot distinguish between random feature models and deep networks in which all layers are trained.
no code implementations • 12 Jan 2022 • Jacob A. Zavatone-Veth, Cengiz Pehlevan
In this short note, we reify the connection between work on the storage capacity problem in wide two-layer treelike neural networks and the rapidly-growing body of literature on kernel limits of wide neural networks.
no code implementations • 23 Nov 2021 • Jacob A. Zavatone-Veth, Cengiz Pehlevan
Inference in deep Bayesian neural networks is only fully understood in the infinite-width limit, where the posterior flexibility afforded by increased depth washes out and the posterior predictive collapses to a shallow Gaussian process.
1 code implementation • NeurIPS 2021 • Trenton Bricken, Cengiz Pehlevan
While Attention has come to be an important mechanism in deep learning, there remains limited intuition for why it works so well.
no code implementations • ICLR 2022 • Alexander Atanasov, Blake Bordelon, Cengiz Pehlevan
Can neural networks in the rich feature learning regime learn a kernel machine with a data-dependent kernel?
1 code implementation • ICLR 2022 • Matthew Farrell, Blake Bordelon, Shubhendu Trivedi, Cengiz Pehlevan
We find that the fraction of separable dichotomies is determined by the dimension of the space that is fixed by the group action.
1 code implementation • NeurIPS 2021 • Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan
Here, we study generalization in kernel regression when the training and test distributions are different using methods from statistical physics.
BIG-bench Machine Learning Out-of-Distribution Generalization +1
1 code implementation • ICLR 2022 • Blake Bordelon, Cengiz Pehlevan
To analyze the influence of data structure on test loss dynamics, we study an exactly solveable model of stochastic gradient descent (SGD) on mean square loss which predicts test loss when training on features with arbitrary covariance structure.
1 code implementation • NeurIPS 2021 • Jacob A. Zavatone-Veth, Abdulkadir Canatar, Benjamin S. Ruben, Cengiz Pehlevan
However, our theoretical understanding of how the learned hidden layer representations of finite networks differ from the fixed representations of infinite networks remains incomplete.
1 code implementation • NeurIPS 2021 • Jacob A. Zavatone-Veth, Cengiz Pehlevan
For deep linear networks, the prior has a simple expression in terms of the Meijer $G$-function.
1 code implementation • 23 Oct 2020 • David Lipshutz, Cengiz Pehlevan, Dmitri B. Chklovskii
To model how the brain performs this task, we seek a biologically plausible single-layer neural network implementation of a blind source separation algorithm.
no code implementations • 21 Jul 2020 • Jacob A. Zavatone-Veth, Cengiz Pehlevan
Though a wide variety of nonlinear activation functions have been proposed for use in artificial neural networks, a detailed understanding of their role in determining the expressive power of a network has not emerged.
no code implementations • ICML 2020 • Yibo Jiang, Cengiz Pehlevan
Recent work showed that overparameterized autoencoders can be trained to implement associative memory via iterative maps, when the trained input-output Jacobian of the network has all of its eigenvalue norms strictly below one.
1 code implementation • 23 Jun 2020 • Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan
We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep neural networks in the infinite-width limit.
1 code implementation • NeurIPS 2020 • Qianyi Li, Cengiz Pehlevan
Excitation-inhibition (E-I) balance is ubiquitously observed in the cortex.
1 code implementation • 11 Apr 2020 • Alper T. Erdogan, Cengiz Pehlevan
An important problem encountered by both natural and engineered signal processing systems is blind source separation.
no code implementations • 24 Feb 2020 • Shanshan Qin, Nayantara Mudur, Cengiz Pehlevan
We propose a novel biologically-plausible solution to the credit assignment problem motivated by observations in the ventral visual pathway and trained deep neural networks.
1 code implementation • ICML 2020 • Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan
We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics.
no code implementations • 11 Dec 2019 • Harshvardhan Sikka, Weishun Zhong, Jun Yin, Cengiz Pehlevan
In many data analysis tasks, it is beneficial to learn representations where each dimension is statistically independent and thus disentangled from the others.
1 code implementation • NeurIPS 2019 • Dina Obeid, Hugo Ramambason, Cengiz Pehlevan
In single-layered and all-to-all connected neural networks, local plasticity has been shown to implement gradient-based learning on a class of cost functions that contain a term that aligns the similarity of outputs to the similarity of inputs.
no code implementations • 5 Aug 2019 • Cengiz Pehlevan, Dmitri B. Chklovskii
Although the currently popular deep learning networks achieve unprecedented performance on some tasks, the human brain still has a monopoly on general intelligence.
no code implementations • 4 Feb 2019 • Cengiz Pehlevan
The design and analysis of spiking neural network algorithms will be accelerated by the advent of new theoretical approaches.
1 code implementation • NeurIPS 2018 • Anirvan Sengupta, Cengiz Pehlevan, Mariano Tepper, Alexander Genkin, Dmitri Chklovskii
Many neurons in the brain, such as place cells in the rodent hippocampus, have localized receptive fields, i. e., they respond to a small neighborhood of stimulus space.
no code implementations • 6 Aug 2018 • Andrea Giovannucci, Victor Minden, Cengiz Pehlevan, Dmitri B. Chklovskii
Big data problems frequently require processing datasets in a streaming fashion, either because all data are available at once but collectively are larger than available memory or because the data intrinsically arrive one data point at a time and must be processed online.
no code implementations • 1 Jun 2017 • Cengiz Pehlevan, Sreyas Mohan, Dmitri B. Chklovskii
Blind source separation, i. e. extraction of independent sources from a mixture, is an important problem for both artificial and natural signal processing.
no code implementations • 23 Mar 2017 • Cengiz Pehlevan, Anirvan Sengupta, Dmitri B. Chklovskii
Modeling self-organization of neural networks for unsupervised learning using Hebbian and anti-Hebbian plasticity has a long history in neuroscience.
no code implementations • 11 Dec 2016 • Yuansi Chen, Cengiz Pehlevan, Dmitri B. Chklovskii
Here we propose online algorithms where the threshold is self-calibrating based on the singular values computed from the existing observations.
no code implementations • 30 Nov 2015 • Cengiz Pehlevan, Dmitri B. Chklovskii
Here, we focus on such workhorses of signal processing as Principal Component Analysis (PCA) and whitening which maximize information transmission in the presence of noise.
no code implementations • NeurIPS 2015 • Cengiz Pehlevan, Dmitri B. Chklovskii
Here, we derive biologically plausible dimensionality reduction algorithms which adapt the number of output dimensions to the eigenspectrum of the input covariance matrix.
no code implementations • 2 Mar 2015 • Tao Hu, Cengiz Pehlevan, Dmitri B. Chklovskii
Here, to overcome this problem, we derive sparse dictionary learning from a novel cost-function - a regularized error of the symmetric factorization of the input's similarity matrix.
2 code implementations • 2 Mar 2015 • Cengiz Pehlevan, Dmitri B. Chklovskii
Despite our extensive knowledge of biophysical properties of neurons, there is no commonly accepted algorithmic theory of neuronal function.
no code implementations • 2 Mar 2015 • Cengiz Pehlevan, Tao Hu, Dmitri B. Chklovskii
Such networks learn the principal subspace, in the sense of principal component analysis (PCA), by adjusting synaptic weights according to activity-dependent learning rules.
no code implementations • 12 May 2014 • Tao Hu, Zaid J. Towfic, Cengiz Pehlevan, Alex Genkin, Dmitri B. Chklovskii
Here we propose to view a neuron as a signal processing device that represents the incoming streaming data matrix as a sparse vector of synaptic weights scaled by an outgoing sparse activity vector.