Search Results for author: Lorenzo Noci

Found 12 papers, 1 papers with code

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

no code implementations27 Feb 2024 Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto

In this work, we find empirical evidence that learning rate transfer can be attributed to the fact that under $\mu$P and its depth extension, the largest eigenvalue of the training loss Hessian (i. e. the sharpness) is largely independent of the width and depth of the network for a sustained period of training time.

How Good is a Single Basin?

no code implementations5 Feb 2024 Kai Lion, Lorenzo Noci, Thomas Hofmann, Gregor Bachmann

The multi-modal nature of neural loss landscapes is often considered to be the main driver behind the empirical success of deep ensembles.

Disentangling Linear Mode-Connectivity

no code implementations15 Dec 2023 Gul Sena Altintas, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

Linear mode-connectivity (LMC) (or lack thereof) is one of the intriguing characteristics of neural network loss landscapes.

Linear Mode Connectivity

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

no code implementations28 Sep 2023 Blake Bordelon, Lorenzo Noci, Mufan Bill Li, Boris Hanin, Cengiz Pehlevan

We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet.

The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit

no code implementations NeurIPS 2023 Lorenzo Noci, Chuning Li, Mufan Bill Li, Bobby He, Thomas Hofmann, Chris Maddison, Daniel M. Roy

Motivated by the success of Transformers, we study the covariance matrix of a modified Softmax-based attention model with skip connections in the proportional limit of infinite-depth-and-width.

Deep Attention Learning Theory

Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

1 code implementation CVPR 2023 Sanghwan Kim, Lorenzo Noci, Antonio Orvieto, Thomas Hofmann

In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances on old tasks drop dramatically after being optimized for a new task.

Continual Learning

The Curious Case of Benign Memorization

no code implementations25 Oct 2022 Sotiris Anagnostidis, Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

While such a memorization capacity seems worrisome, in this work we show that under training protocols that include \textit{data augmentation}, neural networks learn to memorize entirely random labels in a benign way, i. e. they learn embeddings that lead to highly non-trivial performance under nearest neighbour probing.

Data Augmentation Memorization

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

no code implementations7 Jun 2022 Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi

First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization.

How Tempering Fixes Data Augmentation in Bayesian Neural Networks

no code implementations27 May 2022 Gregor Bachmann, Lorenzo Noci, Thomas Hofmann

While data augmentation has been empirically recognized as one of the main drivers of this effect, a theoretical account of its role, on the other hand, is largely missing.

Data Augmentation

Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect

no code implementations NeurIPS 2021 Lorenzo Noci, Kevin Roth, Gregor Bachmann, Sebastian Nowozin, Thomas Hofmann

The dataset curation hypothesis of Aitchison (2020): we show empirically that the CPE does not arise in a real curated data set but can be produced in a controlled experiment with varying curation strength.

Data Augmentation

Precise characterization of the prior predictive distribution of deep ReLU networks

no code implementations NeurIPS 2021 Lorenzo Noci, Gregor Bachmann, Kevin Roth, Sebastian Nowozin, Thomas Hofmann

Recent works on Bayesian neural networks (BNNs) have highlighted the need to better understand the implications of using Gaussian priors in combination with the compositional structure of the network architecture.

Adversarial Learning for Debiasing Knowledge Graph Embeddings

no code implementations29 Jun 2020 Mario Arduini, Lorenzo Noci, Federico Pirovano, Ce Zhang, Yash Raj Shrestha, Bibek Paudel

As a second step, we explore gender bias in KGE, and a careful examination of popular KGE algorithms suggest that sensitive attribute like the gender of a person can be predicted from the embedding.

Attribute Knowledge Graph Embeddings +2

Cannot find the paper you are looking for? You can Submit a new open access paper.