Search Results for author: Alexandru Meterez

Found 2 papers, 1 papers with code

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

no code implementations27 Feb 2024 Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto

In this work, we find empirical evidence that learning rate transfer can be attributed to the fact that under $\mu$P and its depth extension, the largest eigenvalue of the training loss Hessian (i. e. the sharpness) is largely independent of the width and depth of the network for a sustained period of training time.

Towards Training Without Depth Limits: Batch Normalization Without Gradient Explosion

1 code implementation3 Oct 2023 Alexandru Meterez, Amir Joudaki, Francesco Orabona, Alexander Immer, Gunnar Rätsch, Hadi Daneshmand

We answer this question in the affirmative by giving a particular construction of an Multi-Layer Perceptron (MLP) with linear activations and batch-normalization that provably has bounded gradients at any depth.

Cannot find the paper you are looking for? You can Submit a new open access paper.