no code implementations • 19 Jan 2023 • Enea Monzio Compagnoni, Luca Biggio, Antonio Orvieto, Frank Norbert Proske, Hans Kersting, Aurelien Lucchi
We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a lot of interest due to its increased performance over more classical variants of stochastic gradient descent.
no code implementations • 3 Oct 2022 • Sotiris Anagnostidis, Aurelien Lucchi, Thomas Hofmann
Accurately predicting road networks from satellite images requires a global understanding of the network topology.
no code implementations • 19 Sep 2022 • Aurelien Lucchi, Frank Proske, Antonio Orvieto, Francis Bach, Hans Kersting
This generalizes processes based on Brownian motion, such as the Ornstein-Uhlenbeck process.
no code implementations • 1 Jul 2022 • Emanuele Francazi, Marco Baity-Jesi, Aurelien Lucchi
We find that GD is not guaranteed to decrease the loss for each class but that this problem can be addressed by performing a per-class normalization of the gradient.
no code implementations • 7 Jun 2022 • Lorenzo Noci, Sotiris Anagnostidis, Luca Biggio, Antonio Orvieto, Sidak Pal Singh, Aurelien Lucchi
First, we show that rank collapse of the tokens' representations hinders training by causing the gradients of the queries and keys to vanish at initialization.
no code implementations • ICLR 2022 • Sidak Pal Singh, Aurelien Lucchi, Thomas Hofmann, Bernhard Schölkopf
`Double descent' delineates the generalization behaviour of models depending on the regime they belong to: under- or over-parameterized.
no code implementations • 21 Feb 2022 • Youssef Diouane, Aurelien Lucchi, Vihang Patil
Evolutionary strategies have recently been shown to achieve competing levels of performance for complex optimization problems in reinforcement learning.
no code implementations • 6 Feb 2022 • Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien Lucchi
Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models.
1 code implementation • 10 Dec 2021 • Junchi Yang, Antonio Orvieto, Aurelien Lucchi, Niao He
Gradient descent ascent (GDA), the simplest single-loop algorithm for nonconvex minimax optimization, is widely used in practical applications such as generative adversarial networks (GANs) and adversarial training.
1 code implementation • NeurIPS 2021 • Aurelien Lucchi, Antonio Orvieto, Adamos Solomou
We prove that this approach converges to a second-order stationary point at a much faster rate than vanilla methods: namely, the complexity in terms of the number of function evaluations is only linear in the problem dimension.
1 code implementation • 11 Jun 2021 • Luca Biggio, Tommaso Bendinelli, Alexander Neitz, Aurelien Lucchi, Giambattista Parascandolo
We procedurally generate an unbounded set of equations, and simultaneously pre-train a Transformer to predict the symbolic equation from a corresponding set of input-output-pairs.
no code implementations • 7 Jun 2021 • Antonio Orvieto, Jonas Kohler, Dario Pavllo, Thomas Hofmann, Aurelien Lucchi
This paper revisits the so-called vanishing gradient phenomenon, which commonly occurs in deep randomly initialized neural networks.
1 code implementation • ICCV 2021 • Dario Pavllo, Jonas Kohler, Thomas Hofmann, Aurelien Lucchi
Recent advances in differentiable rendering have sparked an interest in learning generative models of textured 3D meshes from image collections.
no code implementations • 23 Mar 2021 • Paulina Grnarova, Yannic Kilcher, Kfir Y. Levy, Aurelien Lucchi, Thomas Hofmann
Among known problems experienced by practitioners is the lack of convergence guarantees or convergence to a non-optimum cycle.
no code implementations • 22 Feb 2021 • Sotiris Anagnostidis, Aurelien Lucchi, Youssef Diouane
Recent applications in machine learning have renewed the interest of the community in min-max optimization problems.
no code implementations • NeurIPS 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi
Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.
1 code implementation • NeurIPS Workshop LMCA 2020 • Luca Biggio, Tommaso Bendinelli, Aurelien Lucchi, Giambattista Parascandolo
Deep neural networks have proved to be powerful function approximators.
1 code implementation • 14 Oct 2020 • Karolis Martinkus, Aurelien Lucchi, Nathanaël Perraudin
However, the dynamics of many real-world systems are challenging to learn due to the presence of nonlinear potentials and a number of interactions that scales quadratically with the number of particles $N$, as in the case of the N-body problem.
no code implementations • ICML 2020 • Yu-Wen Chen, Antonio Orvieto, Aurelien Lucchi
Derivative-free optimization (DFO) has recently gained a lot of momentum in machine learning, spawning interest in the community to design faster methods for problems where gradients are not accessible.
no code implementations • ICML 2020 • Celestine Mendler-Dünner, Aurelien Lucchi
We study preconditioned gradient-based optimization methods where the preconditioning matrix has block-diagonal form.
1 code implementation • NeurIPS 2020 • Dario Pavllo, Graham Spinks, Thomas Hofmann, Marie-Francine Moens, Aurelien Lucchi
A key contribution of our work is the encoding of the mesh and texture as 2D representations, which are semantically aligned and can be easily modeled by a 2D convolutional GAN.
no code implementations • 17 Apr 2020 • Nathanaël Perraudin, Sandro Marcon, Aurelien Lucchi, Tomasz Kacprzak
Weak gravitational lensing mass maps play a crucial role in understanding the evolution of structures in the universe and our ability to constrain cosmological models.
no code implementations • 3 Mar 2020 • Hadi Daneshmand, Jonas Kohler, Francis Bach, Thomas Hofmann, Aurelien Lucchi
Randomly initialized neural networks are known to become harder to train with increasing depth, unless architectural enhancements like residual connections and batch normalization are used.
no code implementations • 11 Feb 2020 • Foivos Alimisis, Antonio Orvieto, Gary Bécigneul, Aurelien Lucchi
We develop a new Riemannian descent algorithm with an accelerated rate of convergence.
Optimization and Control
1 code implementation • ECCV 2020 • Dario Pavllo, Aurelien Lucchi, Thomas Hofmann
We propose a weakly-supervised approach for conditional image generation of complex scenes where a user has fine control over objects appearing in the scene.
no code implementations • 23 Nov 2019 • Aurelien Lucchi, Jonas Kohler
We present a stochastic optimization method that uses a fourth-order regularized model to find local minima of smooth and potentially non-convex objective functions with a finite-sum structure.
1 code implementation • NeurIPS 2019 • Antonio Orvieto, Aurelien Lucchi
Ordinary differential equation (ODE) models of gradient-based optimization methods can provide insights into the dynamics of learning and inspire the design of new algorithms.
1 code implementation • 23 Oct 2019 • Foivos Alimisis, Antonio Orvieto, Gary Bécigneul, Aurelien Lucchi
We propose a novel second-order ODE as the continuous-time limit of a Riemannian accelerated gradient-based method on a manifold with curvature bounded from below.
Optimization and Control
no code implementations • 25 Sep 2019 • Leonard Adolphs, Jonas Kohler, Aurelien Lucchi
We investigate the use of ellipsoidal trust region constraints for second-order optimization of neural networks.
1 code implementation • 15 Aug 2019 • Nathanaël Perraudin, Ankit Srivastava, Aurelien Lucchi, Tomasz Kacprzak, Thomas Hofmann, Alexandre Réfrégier
Our results show that the proposed model produces samples of high visual quality, although the statistical analysis reveals that capturing rare features in the data poses significant problems for the generative models.
no code implementations • 2 Jul 2019 • Antonio Orvieto, Jonas Kohler, Aurelien Lucchi
We first derive a general continuous-time model that can incorporate arbitrary types of memory, for both deterministic and stochastic settings.
no code implementations • 7 Jun 2019 • Janis Fluri, Tomasz Kacprzak, Aurelien Lucchi, Alexandre Refregier, Adam Amara, Thomas Hofmann, Aurel Schneider
We present the cosmological results with a CNN from the KiDS-450 tomographic weak lensing dataset, constraining the total matter density $\Omega_m$, the fluctuation amplitude $\sigma_8$, and the intrinsic alignment amplitude $A_{\rm{IA}}$.
Cosmology and Nongalactic Astrophysics
no code implementations • 22 May 2019 • Jonas Kohler, Leonard Adolphs, Aurelien Lucchi
We investigate the use of regularized Newton methods with adaptive norms for optimizing neural networks.
no code implementations • ICLR 2019 • Paulina Grnarova, Kfir. Y. Levy, Aurelien Lucchi, Nathanael Perraudin, Thomas Hofmann, Andreas Krause
Generative Adversarial Networks (GANs) have shown great results in accurately modeling complex distributions, but their training is known to be difficult due to instabilities caused by a challenging minimax optimization problem.
1 code implementation • NeurIPS 2019 • Paulina Grnarova, Kfir. Y. Levy, Aurelien Lucchi, Nathanael Perraudin, Ian Goodfellow, Thomas Hofmann, Andreas Krause
Evaluations are essential for: (i) relative assessment of different models and (ii) monitoring the progress of a single model throughout training.
1 code implementation • NeurIPS 2019 • Antonio Orvieto, Aurelien Lucchi
We propose new continuous-time formulations for first-order stochastic optimization algorithms such as mini-batch gradient descent and variance-reduced methods.
no code implementations • 23 Jul 2018 • Janis Fluri, Tomasz Kacprzak, Aurelien Lucchi, Alexandre Refregier, Adam Amara, Thomas Hofmann
We find that, for a shape noise level corresponding to 8. 53 galaxies/arcmin$^2$ and the smoothing scale of $\sigma_s = 2. 34$ arcmin, the network is able to generate 45% tighter constraints.
Cosmology and Nongalactic Astrophysics
no code implementations • ICML 2018 • Celestine Dünner, Aurelien Lucchi, Matilde Gargiani, An Bian, Thomas Hofmann, Martin Jaggi
Due to the rapid growth of data and computational resources, distributed optimization has become an active research area in recent years.
no code implementations • 27 May 2018 • Jonas Kohler, Hadi Daneshmand, Aurelien Lucchi, Ming Zhou, Klaus Neymeyr, Thomas Hofmann
Normalization techniques such as Batch Normalization have been applied successfully for training deep neural networks.
no code implementations • 22 May 2018 • Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann
We propose a novel data-dependent structured gradient regularizer to increase the robustness of neural networks vis-a-vis adversarial perturbations.
1 code implementation • 15 May 2018 • Leonard Adolphs, Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann
Gradient-based optimization methods are the most popular choice for finding local optima for classical minimization and saddle point problems.
no code implementations • ICML 2018 • Hadi Daneshmand, Jonas Kohler, Aurelien Lucchi, Thomas Hofmann
We analyze the variance of stochastic gradients along negative curvature directions in certain non-convex machine learning models and show that stochastic gradients exhibit a strong component along these directions.
no code implementations • 27 Jan 2018 • Andres C. Rodriguez, Tomasz Kacprzak, Aurelien Lucchi, Adam Amara, Raphael Sgier, Janis Fluri, Thomas Hofmann, Alexandre Réfrégier
Computational models of the underlying physical processes, such as classical N-body simulations, are extremely resource intensive, as they track the action of gravity in an expanding universe using billions of particles as tracers of the cosmic matter distribution.
no code implementations • 23 Jan 2018 • Jörg Herbel, Tomasz Kacprzak, Adam Amara, Alexandre Refregier, Aurelien Lucchi
We find that our approach is able to accurately reproduce the SDSS PSF at the pixel level, which, due to the speed of both the model evaluation and the parameter estimation, offers good prospects for incorporating our method into the $MCCL$ framework.
no code implementations • ICLR 2018 • Yannic Kilcher, Aurelien Lucchi, Thomas Hofmann
In implicit models, one often interpolates between sampled points in latent space.
no code implementations • ICLR 2018 • Yannic Kilcher, Aurelien Lucchi, Thomas Hofmann
We consider the problem of training generative models with deep neural networks as generators, i. e. to map latent codes to data points.
2 code implementations • 21 Jul 2017 • Pascal Kaiser, Jan Dirk Wegner, Aurelien Lucchi, Martin Jaggi, Thomas Hofmann, Konrad Schindler
We adapt a state-of-the-art CNN architecture for semantic segmentation of buildings and roads in aerial images, and compare its performance when using different training data sets, ranging from manually labeled, pixel-accurate ground truth of the same city to automatic training data derived from OpenStreetMap data from distant locations.
no code implementations • 17 Jul 2017 • Jorit Schmelzle, Aurelien Lucchi, Tomasz Kacprzak, Adam Amara, Raphael Sgier, Alexandre Réfrégier, Thomas Hofmann
We find that our implementation of DCNN outperforms the skewness and kurtosis statistics, especially for high noise levels.
1 code implementation • ICLR 2018 • Paulina Grnarova, Kfir. Y. Levy, Aurelien Lucchi, Thomas Hofmann, Andreas Krause
We consider the problem of training generative models with a Generative Adversarial Network (GAN).
1 code implementation • NeurIPS 2017 • Kevin Roth, Aurelien Lucchi, Sebastian Nowozin, Thomas Hofmann
Deep generative models based on Generative Adversarial Networks (GANs) have demonstrated impressive sample quality but in order to work they require a careful choice of architecture, parameter initialization, and selection of hyper-parameters.
1 code implementation • ICML 2017 • Jonas Moritz Kohler, Aurelien Lucchi
This approach is particularly attractive because it escapes strict saddle points and it provides stronger convergence guarantees than first- and second-order as well as classical trust region methods.
1 code implementation • 7 Mar 2017 • Jan Deriu, Aurelien Lucchi, Valeria De Luca, Aliaksei Severyn, Simon Müller, Mark Cieliebak, Thomas Hofmann, Martin Jaggi
This paper presents a novel approach for multi-lingual sentiment classification in short texts.
1 code implementation • 16 Nov 2016 • Wenhu Chen, Aurelien Lucchi, Thomas Hofmann
We here propose a novel way of using such textual data by artificially generating missing visual information.
3 code implementations • 28 Sep 2016 • Joel Akeret, Chihway Chang, Aurelien Lucchi, Alexandre Refregier
We employ a special type of Convolutional Neural Network, the U-Net, that enables the classification of clean signal and RFI signatures in 2D time-ordered data acquired from a radio telescope.
Instrumentation and Methods for Astrophysics
no code implementations • 20 May 2016 • Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann
Solutions on this path are tracked such that the minimizer of the previous objective is guaranteed to be within the quadratic convergence region of the next objective to be optimized.
no code implementations • 9 Mar 2016 • Hadi Daneshmand, Aurelien Lucchi, Thomas Hofmann
For many machine learning problems, data is abundant and it may be prohibitive to make multiple passes through the full training set.
1 code implementation • 8 Sep 2015 • Octavian-Eugen Ganea, Marina Ganea, Aurelien Lucchi, Carsten Eickhoff, Thomas Hofmann
We demonstrate the accuracy of our approach on a wide range of benchmark datasets, showing that it matches, and in many cases outperforms, existing state-of-the-art methods.
no code implementations • NeurIPS 2015 • Thomas Hofmann, Aurelien Lucchi, Simon Lacoste-Julien, Brian McWilliams
As a side-product we provide a unified convergence analysis for a family of variance reduction algorithms, which we call memorization algorithms.
no code implementations • 28 Mar 2015 • Aurelien Lucchi, Brian McWilliams, Thomas Hofmann
Quasi-Newton methods are widely used in practise for convex loss minimization problems.
no code implementations • CVPR 2013 • Aurelien Lucchi, Yunpeng Li, Pascal Fua
We propose a working set based approximate subgradient descent algorithm to minimize the margin-sensitive hinge loss arising from the soft constraints in max-margin learning frameworks, such as the structured SVM.