A second-order-like optimizer with adaptive gradient scaling for deep learning

1 code implementation8 Oct 2024 Jérôme Bolte, Ryan Boustany, Edouard Pauwels, Andrei Purica

In this empirical article, we introduce INNAprop, an optimization algorithm that combines the INNA method with the RMSprop adaptive gradient scaling.

Image Classification Language Modelling

Derivatives of Stochastic Gradient Descent in parametric optimization

no code implementations24 May 2024 Franck Iutzeler, Edouard Pauwels, Samuel Vaiter

We investigate the behavior of the derivatives of the iterates of Stochastic Gradient Descent (SGD) with respect to that parameter and show that they are driven by an inexact SGD recursion on a different objective function, perturbed by the convergence of the original SGD.

Hyperparameter Optimization Stochastic Optimization

Inexact subgradient methods for semialgebraic functions

no code implementations30 Apr 2024 Jérôme Bolte, Tam Le, Éric Moulines, Edouard Pauwels

Motivated by the widespread use of approximate derivatives in machine learning and optimization, we study inexact subgradient methods with non-vanishing additive errors and step sizes.

Differentiating Nonsmooth Solutions to Parametric Monotone Inclusion Problems

no code implementations15 Dec 2022 Jérôme Bolte, Edouard Pauwels, Antonio José Silveti-Falls

We leverage path differentiability and a recent result on nonsmooth implicit differentiation calculus to give sufficient conditions ensuring that the solution to a monotone inclusion problem will be path differentiable, with formulas for computing its generalized gradient.

The derivatives of Sinkhorn-Knopp converge

no code implementations26 Jul 2022 Edouard Pauwels, Samuel Vaiter

We show that the derivatives of the Sinkhorn-Knopp algorithm, or iterative proportional fitting procedure, converge towards the derivatives of the entropic regularization of the optimal transport problem with a locally uniform linear convergence rate.

On the complexity of nonsmooth automatic differentiation

no code implementations1 Jun 2022 Jérôme Bolte, Ryan Boustany, Edouard Pauwels, Béatrice Pesquet-Popescu

Using the notion of conservative gradient, we provide a simple model to estimate the computational costs of the backward and forward modes of algorithmic differentiation for a wide class of nonsmooth programs.

Automatic differentiation of nonsmooth iterative algorithms

no code implementations31 May 2022 Jérôme Bolte, Edouard Pauwels, Samuel Vaiter

Is there a limiting object for nonsmooth piggyback automatic differentiation (AD)?

Path differentiability of ODE flows

no code implementations11 Jan 2022 Swann Marx, Edouard Pauwels

We consider flows of ordinary differential equations (ODEs) driven by path differentiable vector fields.

Numerical influence of ReLU'(0) on backpropagation

1 code implementation NeurIPS 2021 David Bertoin, Jérôme Bolte, Sébastien Gerchinovitz, Edouard Pauwels

In theory, the choice of ReLU(0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training.

Nonsmooth Implicit Differentiation for Machine Learning and Optimization

no code implementations NeurIPS 2021 Jérôme Bolte, Tam Le, Edouard Pauwels, Antonio Silveti-Falls

In view of training increasingly complex learning architectures, we establish a nonsmooth implicit function theorem with an operational calculus.

BIG-bench Machine Learning

Second-order step-size tuning of SGD for non-convex optimization

1 code implementation5 Mar 2021 Camille Castera, Jérôme Bolte, Cédric Févotte, Edouard Pauwels

In view of a direct and simple improvement of vanilla SGD, this paper presents a fine-tuning of its step-sizes in the mini-batch case.

A Sublevel Moment-SOS Hierarchy for Polynomial Optimization

1 code implementation13 Jan 2021 Tong Chen, Jean-Bernard Lasserre, Victor Magron, Edouard Pauwels

We introduce a sublevel Moment-SOS hierarchy where each SDP relaxation can be viewed as an intermediate (or interpolation) between the d-th and (d+1)-th order SDP relaxations of the Moment-SOS hierarchy (dense or sparse version).

Combinatorial Optimization Optimization and Control

Sequential convergence of AdaGrad algorithm for smooth convex optimization

no code implementations24 Nov 2020 Cheik Traoré, Edouard Pauwels

We prove that the iterates produced by, either the scalar step size variant, or the coordinatewise variant of AdaGrad algorithm, are convergent sequences when applied to convex objective functions with Lipschitz gradient.

A Hölderian backtracking method for min-max and min-min problems

no code implementations17 Jul 2020 Jérôme Bolte, Lilian Glaudin, Edouard Pauwels, Mathieu Serrurier

We present a new algorithm to solve min-max or min-min problems out of the convex world.

Incremental Without Replacement Sampling in Nonconvex Optimization

no code implementations15 Jul 2020 Edouard Pauwels

Minibatch decomposition methods for empirical risk minimization are commonly analysed in a stochastic approximation setting, also known as sampling with replacement.

A mathematical model for automatic differentiation in machine learning

no code implementations NeurIPS 2020 Jerome Bolte, Edouard Pauwels

Automatic differentiation, as implemented today, does not have a simple mathematical model adapted to the needs of modern machine learning.

BIG-bench Machine Learning

Semialgebraic Optimization for Lipschitz Constants of ReLU Networks

2 code implementations NeurIPS 2020 Tong Chen, Jean-Bernard Lasserre, Victor Magron, Edouard Pauwels

The Lipschitz constant of a network plays an important role in many applications of deep learning, such as robustness certification and Wasserstein Generative Adversarial Network.

Adversarial Robustness

Rate of convergence for geometric inference based on the empirical Christoffel function

no code implementations31 Oct 2019 Mai Trang Vu, François Bachoc, Edouard Pauwels

We consider the problem of estimating the support of a measure from a finite, independent, sample.

Fairness with Wasserstein Adversarial Networks

no code implementations25 Sep 2019 serrurier Mathieu, Loubes Jean-Michel, Edouard Pauwels

For both models, we devise a learning algorithm based on approximation of Wasserstein distances using adversarial networks.


Conservative set valued fields, automatic differentiation, stochastic gradient method and deep learning

no code implementations23 Sep 2019 Jérôme Bolte, Edouard Pauwels

Modern problems in AI or in numerical analysis require nonsmooth approaches with a flexible calculus.

Data analysis from empirical moments and the Christoffel function

no code implementations19 Oct 2018 Edouard Pauwels, Mihai Putinar, Jean-Bernard Lasserre

Spectral features of the empirical moment matrix constitute a resourceful tool for unveiling properties of a cloud of points, among which, density, support and latent structures.

Relating Leverage Scores and Density using Regularized Christoffel Functions

no code implementations NeurIPS 2018 Edouard Pauwels, Francis Bach, Jean-Philippe Vert

Statistical leverage scores emerged as a fundamental tool for matrix sketching and column sampling with applications to low rank approximation, regression, random feature learning and quadrature.


The empirical Christoffel function with applications in data analysis

no code implementations11 Jan 2017 Jean-Bernard Lasserre, Edouard Pauwels

Secondly, we provide a consistency result which relates the empirical Christoffel function and its population counterpart in the limit of large samples.

BIG-bench Machine Learning Novelty Detection

