no code implementations • 7 Nov 2024 • Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Ré, aditi raghunathan
Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this.
no code implementations • 5 Nov 2024 • Tanishq Kumar, Blake Bordelon, Cengiz Pehlevan, Venkatesh N. Murthy, Samuel J. Gershman
Does learning of task-relevant representations stop when behavior stops changing?
no code implementations • 26 Sep 2024 • Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
We develop a solvable model of neural scaling laws beyond the kernel limit.
no code implementations • 24 May 2024 • Blake Bordelon, Hamza Tahir Chaudhry, Cengiz Pehlevan
In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime.
no code implementations • 2 Feb 2024 • Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan
On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude.
no code implementations • 9 Oct 2023 • Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan
We identify sufficient statistics for the test loss of such a network, and tracking these over training reveals that grokking arises in this setting when the network first attempts to fit a kernel regression solution with its initial features, followed by late-time feature learning where a generalizing solution is identified after train loss is already low.
no code implementations • 28 Sep 2023 • Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan
We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet.
1 code implementation • NeurIPS 2023 • Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan
We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function.
no code implementations • NeurIPS 2023 • Nikhil Vyas, Alexander Atanasov, Blake Bordelon, Depen Morwani, Sabarish Sainathan, Cengiz Pehlevan
We call this the bias of narrower width.
1 code implementation • NeurIPS 2023 • Blake Bordelon, Cengiz Pehlevan
However, in the rich, feature learning regime, the fluctuations of the kernels and predictions are dynamically coupled with a variance that can be computed self-consistently.
1 code implementation • 23 Dec 2022 • Alexander Atanasov, Blake Bordelon, Sabarish Sainathan, Cengiz Pehlevan
For small training set sizes $P$, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network (NN), either in the kernel or mean-field/feature-learning regime.
no code implementations • 5 Oct 2022 • Blake Bordelon, Cengiz Pehlevan
In the lazy limit, we find that DFA and Hebb can only learn using the last layer features, while full FA can utilize earlier layers with a scale determined by the initial correlation between feedforward and feedback weight matrices.
no code implementations • 19 May 2022 • Blake Bordelon, Cengiz Pehlevan
We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory.
no code implementations • ICLR 2022 • Alexander Atanasov, Blake Bordelon, Cengiz Pehlevan
Can neural networks in the rich feature learning regime learn a kernel machine with a data-dependent kernel?
1 code implementation • ICLR 2022 • Matthew Farrell, Blake Bordelon, Shubhendu Trivedi, Cengiz Pehlevan
We find that the fraction of separable dichotomies is determined by the dimension of the space that is fixed by the group action.
1 code implementation • NeurIPS 2021 • Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan
Here, we study generalization in kernel regression when the training and test distributions are different using methods from statistical physics.
BIG-bench Machine Learning Out-of-Distribution Generalization +1
1 code implementation • ICLR 2022 • Blake Bordelon, Cengiz Pehlevan
To analyze the influence of data structure on test loss dynamics, we study an exactly solveable model of stochastic gradient descent (SGD) on mean square loss which predicts test loss when training on features with arbitrary covariance structure.
no code implementations • 29 May 2021 • Haozhe Shan, Blake Bordelon
In this work, we seek to theoretically understand kernel alignment, a prominent and ubiquitous structural change that aligns the NTK with the target function.
1 code implementation • 23 Jun 2020 • Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan
We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep neural networks in the infinite-width limit.
1 code implementation • ICML 2020 • Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan
We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics.
1 code implementation • 7 Oct 2018 • Bryce Bagley, Blake Bordelon, Benjamin Moseley, Ralf Wessel
Learning synaptic weights of spiking neural network (SNN) models that can reproduce target spike trains from provided neural firing data is a central problem in computational neuroscience and spike-based computing.