Search Results for author: Antonio Orvieto

Found 28 papers, 10 papers with code

On the low-shot transferability of [V]-Mamba

no code implementations • 15 Mar 2024 • Diganta Misra, Jay Gala, Antonio Orvieto

The strength of modern large-scale neural networks lies in their ability to efficiently adapt to new tasks with few examples.

Few-Shot Learning Transfer Learning +1

Paper
Add Code

Theoretical Foundations of Deep Selective State-Space Models

no code implementations • 29 Feb 2024 • Nicola Muca Cirone, Antonio Orvieto, Benjamin Walker, Cristopher Salvi, Terry Lyons

Structured state-space models (SSMs) such as S4, stemming from the seminal work of Gu et al., are gaining popularity as effective approaches for modeling sequential data.

Paper
Add Code

Why do Learning Rates Transfer? Reconciling Optimization and Scaling Limits for Deep Learning

no code implementations • 27 Feb 2024 • Lorenzo Noci, Alexandru Meterez, Thomas Hofmann, Antonio Orvieto

In this work, we find empirical evidence that learning rate transfer can be attributed to the fact that under $\mu$P and its depth extension, the largest eigenvalue of the training loss Hessian (i. e. the sharpness) is largely independent of the width and depth of the network for a sustained period of training time.

Paper
Add Code

SDEs for Minimax Optimization

1 code implementation • 19 Feb 2024 • Enea Monzio Compagnoni, Antonio Orvieto, Hans Kersting, Frank Norbert Proske, Aurelien Lucchi

Minimax optimization problems have attracted a lot of attention over the past few years, with applications ranging from economics to machine learning.

Paper
Code

Recurrent Distance Filtering for Graph Representation Learning

no code implementations • 3 Dec 2023 • Yuhui Ding, Antonio Orvieto, Bobby He, Thomas Hofmann

Graph neural networks based on iterative one-hop message passing have been shown to struggle in harnessing the information from distant nodes effectively.

Graph Representation Learning Inductive Bias +1

Paper
Add Code

Universality of Linear Recurrences Followed by Non-linear Projections: Finite-Width Guarantees and Benefits of Complex Eigenvalues

no code implementations • 21 Jul 2023 • Antonio Orvieto, Soham De, Caglar Gulcehre, Razvan Pascanu, Samuel L. Smith

Deep neural networks based on linear complex-valued RNNs interleaved with position-wise MLPs are gaining traction as competitive approaches to sequence modeling.

Computational Efficiency Position

Paper
Add Code

Achieving a Better Stability-Plasticity Trade-off via Auxiliary Networks in Continual Learning

1 code implementation • CVPR 2023 • Sanghwan Kim, Lorenzo Noci, Antonio Orvieto, Thomas Hofmann

In contrast to the natural capabilities of humans to learn new tasks in a sequential fashion, neural networks are known to suffer from catastrophic forgetting, where the model's performances on old tasks drop dramatically after being optimized for a new task.

Continual Learning

Paper
Code

Resurrecting Recurrent Neural Networks for Long Sequences

8 code implementations • 11 Mar 2023 • Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

Recurrent Neural Networks (RNNs) offer fast inference on long sequences but are hard to optimize and slow to train.

Ranked #3 on Sequential Image Classification on Sequential CIFAR-10

Classification Computational Efficiency +1

Paper
Code

An SDE for Modeling SAM: Theory and Insights

no code implementations • 19 Jan 2023 • Enea Monzio Compagnoni, Luca Biggio, Antonio Orvieto, Frank Norbert Proske, Hans Kersting, Aurelien Lucchi

We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent.

Paper
Add Code

On the Theoretical Properties of Noise Correlation in Stochastic Optimization

no code implementations • 19 Sep 2022 • Aurelien Lucchi, Frank Proske, Antonio Orvieto, Francis Bach, Hans Kersting

This generalizes processes based on Brownian motion, such as the Ornstein-Uhlenbeck process.

Stochastic Optimization

Paper
Add Code

Explicit Regularization in Overparametrized Models via Noise Injection

1 code implementation • 9 Jun 2022 • Antonio Orvieto, Anant Raj, Hans Kersting, Francis Bach

Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties.

Paper
Code

Signal Propagation in Transformers: Theoretical Perspectives and the Role of Rank Collapse

no code implementations • 7 Jun 2022 • Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi

First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization.

Paper
Add Code

Anticorrelated Noise Injection for Improved Generalization

no code implementations • 6 Feb 2022 • Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien Lucchi

Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models.

BIG-bench Machine Learning

Paper
Add Code

On the effectiveness of Randomized Signatures as Reservoir for Learning Rough Dynamics

no code implementations • 2 Jan 2022 • Enea Monzio Compagnoni, Anna Scampicchio, Luca Biggio, Antonio Orvieto, Thomas Hofmann, Josef Teichmann

Many finance, physics, and engineering phenomena are modeled by continuous-time dynamical systems driven by highly irregular (stochastic) inputs.

LEMMA Time Series +1

Paper
Add Code

Faster Single-loop Algorithms for Minimax Optimization without Strong Concavity

1 code implementation • 10 Dec 2021 • Junchi Yang, Antonio Orvieto, Aurelien Lucchi, Niao He

Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training.

Paper
Code

Rethinking the Variational Interpretation of Accelerated Optimization Methods

no code implementations • NeurIPS 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand

The continuous-time model of Nesterov's momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization.

Paper
Add Code

On the Second-order Convergence Properties of Random Search Methods

1 code implementation • NeurIPS 2021 • Aurelien Lucchi, Antonio Orvieto, Adamos Solomou

We prove that this approach converges to a second-order stationary point at a much faster rate than vanilla methods: namely, the complexity in terms of the number of function evaluations is only linear in the problem dimension.

Paper
Code

Empirics on the expressiveness of Randomized Signature

no code implementations • NeurIPS Workshop DLDE 2021 • Enea Monzio Compagnoni, Luca Biggio, Antonio Orvieto

Time series analysis is a widespread task in Natural Sciences, Social Sciences and Engineering.

Time Series Time Series Analysis

Paper
Add Code

Vanishing Curvature and the Power of Adaptive Methods in Randomly Initialized Deep Networks

no code implementations • 7 Jun 2021 • Antonio Orvieto, Jonas Kohler, Dario Pavllo, Thomas Hofmann, Aurelien Lucchi

This paper revisits the so-called vanishing gradient phenomenon, which commonly occurs in deep randomly initialized neural networks.

Paper
Add Code

Revisiting the Role of Euler Numerical Integration on Acceleration and Stability in Convex Optimization

no code implementations • 23 Feb 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand, Thomas Hofmann, Roy Smith

Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers.

Numerical Integration

Paper
Add Code

Two-Level K-FAC Preconditioning for Deep Learning

no code implementations • 1 Nov 2020 • Nikolaos Tselepidis, Jonas Kohler, Antonio Orvieto

In the context of deep learning, many optimization methods use gradient covariance information in order to accelerate the convergence of Stochastic Gradient Descent.

Vocal Bursts Valence Prediction

Paper
Add Code

Learning explanations that are hard to vary

3 code implementations • ICLR 2021 • Giambattista Parascandolo, Alexander Neitz, Antonio Orvieto, Luigi Gresele, Bernhard Schölkopf

In this paper, we investigate the principle that `good explanations are hard to vary' in the context of deep learning.

Memorization

4,450

Paper
Code

An Accelerated DFO Algorithm for Finite-sum Convex Functions

no code implementations • ICML 2020 • Yu-Wen Chen, Antonio Orvieto, Aurelien Lucchi

Derivative-free optimization (DFO) has recently gained a lot of momentum in machine learning, spawning interest in the community to design faster methods for problems where gradients are not accessible.

Paper
Add Code

Practical Accelerated Optimization on Riemannian Manifolds

no code implementations • 11 Feb 2020 • Foivos Alimisis, Antonio Orvieto, Gary Bécigneul, Aurelien Lucchi

We develop a new Riemannian descent algorithm with an accelerated rate of convergence.

Optimization and Control

Paper
Add Code

Shadowing Properties of Optimization Algorithms

1 code implementation • NeurIPS 2019 • Antonio Orvieto, Aurelien Lucchi

Ordinary differential equation (ODE) models of gradient-based optimization methods can provide insights into the dynamics of learning and inspire the design of new algorithms.

Paper
Code

A Continuous-time Perspective for Modeling Acceleration in Riemannian Optimization

1 code implementation • 23 Oct 2019 • Foivos Alimisis, Antonio Orvieto, Gary Bécigneul, Aurelien Lucchi

We propose a novel second-order ODE as the continuous-time limit of a Riemannian accelerated gradient-based method on a manifold with curvature bounded from below.

Optimization and Control

Paper
Code

The Role of Memory in Stochastic Optimization

no code implementations • 2 Jul 2019 • Antonio Orvieto, Jonas Kohler, Aurelien Lucchi

We first derive a general continuous-time model that can incorporate arbitrary types of memory, for both deterministic and stochastic settings.

Stochastic Optimization

Paper
Add Code

Continuous-time Models for Stochastic Optimization Algorithms

1 code implementation • NeurIPS 2019 • Antonio Orvieto, Aurelien Lucchi

We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods.

Stochastic Optimization

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.