Search Results for author: Blake Bordelon

Found 21 papers, 9 papers with code

Scaling Laws for Precision

no code implementations7 Nov 2024 Tanishq Kumar, Zachary Ankner, Benjamin F. Spector, Blake Bordelon, Niklas Muennighoff, Mansheej Paul, Cengiz Pehlevan, Christopher Ré, aditi raghunathan

Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this.

Quantization

How Feature Learning Can Improve Neural Scaling Laws

no code implementations26 Sep 2024 Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan

We develop a solvable model of neural scaling laws beyond the kernel limit.

Infinite Limits of Multi-head Transformer Dynamics

no code implementations24 May 2024 Blake Bordelon, Hamza Tahir Chaudhry, Cengiz Pehlevan

In this work, we analyze various scaling limits of the training dynamics of transformer models in the feature learning regime.

A Dynamical Model of Neural Scaling Laws

no code implementations2 Feb 2024 Blake Bordelon, Alexander Atanasov, Cengiz Pehlevan

On a variety of tasks, the performance of neural networks predictably improves with training time, dataset size and model size across many orders of magnitude.

Grokking as the Transition from Lazy to Rich Training Dynamics

no code implementations9 Oct 2023 Tanishq Kumar, Blake Bordelon, Samuel J. Gershman, Cengiz Pehlevan

We identify sufficient statistics for the test loss of such a network, and tracking these over training reveals that grokking arises in this setting when the network first attempts to fit a kernel regression solution with its initial features, followed by late-time feature learning where a generalizing solution is identified after train loss is already low.

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

no code implementations28 Sep 2023 Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan

We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet.

Loss Dynamics of Temporal Difference Reinforcement Learning

1 code implementation NeurIPS 2023 Blake Bordelon, Paul Masset, Henry Kuo, Cengiz Pehlevan

We study how learning dynamics and plateaus depend on feature structure, learning rate, discount factor, and reward function.

reinforcement-learning Reinforcement Learning

Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks

1 code implementation NeurIPS 2023 Blake Bordelon, Cengiz Pehlevan

However, in the rich, feature learning regime, the fluctuations of the kernels and predictions are dynamically coupled with a variance that can be computed self-consistently.

The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich Regimes

1 code implementation23 Dec 2022 Alexander Atanasov, Blake Bordelon, Sabarish Sainathan, Cengiz Pehlevan

For small training set sizes $P$, the generalization error of wide neural networks is well-approximated by the error of an infinite width neural network (NN), either in the kernel or mean-field/feature-learning regime.

regression

The Influence of Learning Rule on Representation Dynamics in Wide Neural Networks

no code implementations5 Oct 2022 Blake Bordelon, Cengiz Pehlevan

In the lazy limit, we find that DFA and Hebb can only learn using the last layer features, while full FA can utilize earlier layers with a scale determined by the initial correlation between feedforward and feedback weight matrices.

Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks

no code implementations19 May 2022 Blake Bordelon, Cengiz Pehlevan

We analyze feature learning in infinite-width neural networks trained with gradient flow through a self-consistent dynamical field theory.

Neural Networks as Kernel Learners: The Silent Alignment Effect

no code implementations ICLR 2022 Alexander Atanasov, Blake Bordelon, Cengiz Pehlevan

Can neural networks in the rich feature learning regime learn a kernel machine with a data-dependent kernel?

Out-of-Distribution Generalization in Kernel Regression

1 code implementation NeurIPS 2021 Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan

Here, we study generalization in kernel regression when the training and test distributions are different using methods from statistical physics.

BIG-bench Machine Learning Out-of-Distribution Generalization +1

Learning Curves for SGD on Structured Features

1 code implementation ICLR 2022 Blake Bordelon, Cengiz Pehlevan

To analyze the influence of data structure on test loss dynamics, we study an exactly solveable model of stochastic gradient descent (SGD) on mean square loss which predicts test loss when training on features with arbitrary covariance structure.

BIG-bench Machine Learning Feature Correlation +1

A Theory of Neural Tangent Kernel Alignment and Its Influence on Training

no code implementations29 May 2021 Haozhe Shan, Blake Bordelon

In this work, we seek to theoretically understand kernel alignment, a prominent and ubiquitous structural change that aligns the NTK with the target function.

Spectral Bias and Task-Model Alignment Explain Generalization in Kernel Regression and Infinitely Wide Neural Networks

1 code implementation23 Jun 2020 Abdulkadir Canatar, Blake Bordelon, Cengiz Pehlevan

We present applications of our theory to real and synthetic datasets, and for many kernels including those that arise from training deep neural networks in the infinite-width limit.

BIG-bench Machine Learning Inductive Bias +1

Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks

1 code implementation ICML 2020 Blake Bordelon, Abdulkadir Canatar, Cengiz Pehlevan

We derive analytical expressions for the generalization performance of kernel regression as a function of the number of training samples using theoretical methods from Gaussian processes and statistical physics.

Gaussian Processes regression

Pre-Synaptic Pool Modification (PSPM): A Supervised Learning Procedure for Spiking Neural Networks

1 code implementation7 Oct 2018 Bryce Bagley, Blake Bordelon, Benjamin Moseley, Ralf Wessel

Learning synaptic weights of spiking neural network (SNN) models that can reproduce target spike trains from provided neural firing data is a central problem in computational neuroscience and spike-based computing.

Cannot find the paper you are looking for? You can Submit a new open access paper.