no code implementations • 19 Jan 2023 • Enea Monzio Compagnoni, Antonio Orvieto, Luca Biggio, Hans Kersting, Frank Norbert Proske, Aurelien Lucchi
We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent.
no code implementations • 19 Sep 2022 • Aurelien Lucchi, Frank Proske, Antonio Orvieto, Francis Bach, Hans Kersting
This generalizes processes based on Brownian motion, such as the Ornstein-Uhlenbeck process.
1 code implementation • 9 Jun 2022 • Antonio Orvieto, Anant Raj, Hans Kersting, Francis Bach
Injecting noise within gradient descent has several desirable features, such as smoothing and regularizing properties.
no code implementations • 7 Jun 2022 • Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi
First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization.
no code implementations • 6 Feb 2022 • Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien Lucchi
Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models.
no code implementations • 2 Jan 2022 • Enea Monzio Compagnoni, Luca Biggio, Antonio Orvieto, Thomas Hofmann, Josef Teichmann
Time series analysis is a widespread task in Natural Sciences, Social Sciences, and Engineering.
1 code implementation • 10 Dec 2021 • Junchi Yang, Antonio Orvieto, Aurelien Lucchi, Niao He
Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training.
no code implementations • NeurIPS 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand
The continuous-time model of Nesterov's momentum provides a thought-provoking perspective for understanding the nature of the acceleration phenomenon in convex optimization.
1 code implementation • NeurIPS 2021 • Aurelien Lucchi, Antonio Orvieto, Adamos Solomou
We prove that this approach converges to a second-order stationary point at a much faster rate than vanilla methods: namely, the complexity in terms of the number of function evaluations is only linear in the problem dimension.
no code implementations • NeurIPS Workshop DLDE 2021 • Enea Monzio Compagnoni, Luca Biggio, Antonio Orvieto
Time series analysis is a widespread task in Natural Sciences, Social Sciences and Engineering.
no code implementations • 7 Jun 2021 • Antonio Orvieto, Jonas Kohler, Dario Pavllo, Thomas Hofmann, Aurelien Lucchi
This paper revisits the so-called vanishing gradient phenomenon, which commonly occurs in deep randomly initialized neural networks.
no code implementations • 23 Feb 2021 • Peiyuan Zhang, Antonio Orvieto, Hadi Daneshmand, Thomas Hofmann, Roy Smith
Viewing optimization methods as numerical integrators for ordinary differential equations (ODEs) provides a thought-provoking modern framework for studying accelerated first-order optimizers.
no code implementations • 1 Nov 2020 • Nikolaos Tselepidis, Jonas Kohler, Antonio Orvieto
In the context of deep learning, many optimization methods use gradient covariance information in order to accelerate the convergence of Stochastic Gradient Descent.
3 code implementations • ICLR 2021 • Giambattista Parascandolo, Alexander Neitz, Antonio Orvieto, Luigi Gresele, Bernhard Schölkopf
In this paper, we investigate the principle that `good explanations are hard to vary' in the context of deep learning.
no code implementations • ICML 2020 • Yu-Wen Chen, Antonio Orvieto, Aurelien Lucchi
Derivative-free optimization (DFO) has recently gained a lot of momentum in machine learning, spawning interest in the community to design faster methods for problems where gradients are not accessible.
no code implementations • 11 Feb 2020 • Foivos Alimisis, Antonio Orvieto, Gary Bécigneul, Aurelien Lucchi
We develop a new Riemannian descent algorithm with an accelerated rate of convergence.
Optimization and Control
1 code implementation • NeurIPS 2019 • Antonio Orvieto, Aurelien Lucchi
Ordinary differential equation (ODE) models of gradient-based optimization methods can provide insights into the dynamics of learning and inspire the design of new algorithms.
1 code implementation • 23 Oct 2019 • Foivos Alimisis, Antonio Orvieto, Gary Bécigneul, Aurelien Lucchi
We propose a novel second-order ODE as the continuous-time limit of a Riemannian accelerated gradient-based method on a manifold with curvature bounded from below.
Optimization and Control
no code implementations • 2 Jul 2019 • Antonio Orvieto, Jonas Kohler, Aurelien Lucchi
We first derive a general continuous-time model that can incorporate arbitrary types of memory, for both deterministic and stochastic settings.
1 code implementation • NeurIPS 2019 • Antonio Orvieto, Aurelien Lucchi
We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods.