Search Results for author: Atish Agarwala

Found 9 papers, 0 papers with code

Gradient descent induces alignment between weights and the empirical NTK for deep non-linear networks

no code implementations7 Feb 2024 Daniel Beaglehole, Ioannis Mitliagkas, Atish Agarwala

Prior works have identified that the gram matrices of the weights in trained neural networks of general architectures are proportional to the average gradient outer product of the model, in a statement known as the Neural Feature Ansatz (NFA).

Neglected Hessian component explains mysteries in Sharpness regularization

no code implementations19 Jan 2024 Yann N. Dauphin, Atish Agarwala, Hossein Mobahi

We find that regularizing feature exploitation but not feature exploration yields performance similar to gradient penalties.

On the Interplay Between Stepsize Tuning and Progressive Sharpening

no code implementations30 Nov 2023 Vincent Roulet, Atish Agarwala, Fabian Pedregosa

Recent empirical work has revealed an intriguing property of deep learning models by which the sharpness (largest eigenvalue of the Hessian) increases throughout optimization until it stabilizes around a critical value at which the optimizer operates at the edge of stability, given a fixed stepsize (Cohen et al, 2022).

SAM operates far from home: eigenvalue regularization as a dynamical phenomenon

no code implementations17 Feb 2023 Atish Agarwala, Yann N. Dauphin

We show that in a simplified setting, SAM dynamically induces a stabilization related to the edge of stability (EOS) phenomenon observed in large learning rate gradient descent.

Second-order regression models exhibit progressive sharpening to the edge of stability

no code implementations10 Oct 2022 Atish Agarwala, Fabian Pedregosa, Jeffrey Pennington

Recent studies of gradient descent with large step sizes have shown that there is often a regime with an initial increase in the largest eigenvalue of the loss Hessian (progressive sharpening), followed by a stabilization of the eigenvalue near the maximum value which allows convergence (edge of stability).

regression

Deep equilibrium networks are sensitive to initialization statistics

no code implementations19 Jul 2022 Atish Agarwala, Samuel S. Schoenholz

Deep equilibrium networks (DEQs) are a promising way to construct models which trade off memory for compute.

Temperature check: theory and practice for training models with softmax-cross-entropy losses

no code implementations14 Oct 2020 Atish Agarwala, Jeffrey Pennington, Yann Dauphin, Sam Schoenholz

In this work we develop a theory of early learning for models trained with softmax-cross-entropy loss and show that the learning dynamics depend crucially on the inverse-temperature $\beta$ as well as the magnitude of the logits at initialization, $||\beta{\bf z}||_{2}$.

Sentiment Analysis

Learning the gravitational force law and other analytic functions

no code implementations15 May 2020 Atish Agarwala, Abhimanyu Das, Rina Panigrahy, Qiuyi Zhang

We present experimental evidence that the many-body gravitational force function is easier to learn with ReLU networks as compared to networks with exponential activations.

Cannot find the paper you are looking for? You can Submit a new open access paper.