Search Results for author: Gabriel Peyré

Found 57 papers, 27 papers with code

How do Transformers perform In-Context Autoregressive Learning?

no code implementations8 Feb 2024 Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré

More precisely, focusing on commuting orthogonal matrices $W$, we first show that a trained one-layer linear Transformer implements one step of gradient descent for the minimization of an inner objective function, when considering augmented tokens.

Language Modelling

Understanding the Regularity of Self-Attention with Optimal Transport

no code implementations22 Dec 2023 Valérie Castin, Pierre Ablin, Gabriel Peyré

This allows us to generalize attention to inputs of infinite length, and to derive an upper bound and a lower bound on the Lipschitz constant of self-attention on compact sets.

Structured Transforms Across Spaces with Cost-Regularized Optimal Transport

no code implementations9 Nov 2023 Othmane Sebbouh, Marco Cuturi, Gabriel Peyré

Matching a source to a target probability measure is often solved by instantiating a linear optimal transport (OT) problem, parameterized by a ground cost function that quantifies discrepancy between points.

Test like you Train in Implicit Deep Learning

no code implementations24 May 2023 Zaccharie Ramzi, Pierre Ablin, Gabriel Peyré, Thomas Moreau

Implicit deep learning has recently gained popularity with applications ranging from meta-learning to Deep Equilibrium Networks (DEQs).

Meta-Learning

Unbalanced Optimal Transport, from Theory to Numerics

no code implementations16 Nov 2022 Thibault Séjourné, Gabriel Peyré, François-Xavier Vialard

Optimal Transport (OT) has recently emerged as a central tool in data sciences to compare in a geometrically faithful way point clouds and more generally probability distributions.

Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

no code implementations29 May 2022 Michael E. Sander, Pierre Ablin, Gabriel Peyré

As a byproduct of our analysis, we consider the use of a memory-free discrete adjoint method to train a ResNet by recovering the activations on the fly through a backward pass of the network, and show that this method theoretically succeeds at large depth if the residual functions are Lipschitz with the input.

Smooth over-parameterized solvers for non-smooth structured optimization

no code implementations3 May 2022 Clarice Poon, Gabriel Peyré

Our main theoretical contribution connects gradient descent on this reformulation to a mirror descent flow with a varying Hessian metric.

Faster Unbalanced Optimal Transport: Translation invariant Sinkhorn and 1-D Frank-Wolfe

no code implementations3 Jan 2022 Thibault Séjourné, François-Xavier Vialard, Gabriel Peyré

In this work, we identify the cause for this deficiency, namely the lack of a global normalization of the iterates, which equivalently corresponds to a translation of the dual OT potentials.

Translation

Global convergence of ResNets: From finite to infinite width using linear parameterization

1 code implementation10 Dec 2021 Raphaël Barboni, Gabriel Peyré, François-Xavier Vialard

To bridge the gap between the lazy and mean field regimes, we study Residual Networks (ResNets) in which the residual block has linear parametrization while still being nonlinear.

Randomized Stochastic Gradient Descent Ascent

no code implementations25 Nov 2021 Othmane Sebbouh, Marco Cuturi, Gabriel Peyré

RSGDA can be parameterized using optimal loop sizes that guarantee the best convergence rates known to hold for SGDA.

Sinkformers: Transformers with Doubly Stochastic Attention

1 code implementation22 Oct 2021 Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

We show that the row-wise stochastic attention matrices in classical Transformers get close to doubly stochastic matrices as the number of epochs increases, justifying the use of Sinkhorn normalization as an informative prior.

Image Classification

Linear-Time Gromov Wasserstein Distances using Low Rank Couplings and Costs

1 code implementation NeurIPS 2021 Meyer Scetbon, Gabriel Peyré, Marco Cuturi

The ability to align points across two related yet incomparable point clouds (e. g. living in different spaces) plays an important role in machine learning.

Smooth Bilevel Programming for Sparse Regularization

1 code implementation NeurIPS 2021 Clarice Poon, Gabriel Peyré

Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning.

Specificity

Low-Rank Sinkhorn Factorization

1 code implementation8 Mar 2021 Meyer Scetbon, Marco Cuturi, Gabriel Peyré

Because matrix-vector products are pervasive in the Sinkhorn algorithm, several works have proposed to \textit{approximate} kernel matrices appearing in its iterations using low-rank factors.

Fast and accurate optimization on the orthogonal manifold without retraction

1 code implementation15 Feb 2021 Pierre Ablin, Gabriel Peyré

We consider the problem of minimizing a function over the manifold of orthogonal matrices.

Momentum Residual Neural Networks

1 code implementation15 Feb 2021 Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.

Image Classification

Unsupervised Ground Metric Learning using Wasserstein Singular Vectors

1 code implementation11 Feb 2021 Geert-Jan Huizing, Laura Cantini, Gabriel Peyré

Optimal Transport (OT) lifts a distance between features (the "ground metric") to a geometrically meaningful distance between samples.

Clustering Dimensionality Reduction +1

Entropic Optimal Transport between Unbalanced Gaussian Measures has a Closed Form

no code implementations NeurIPS 2020 Hicham Janati, Boris Muzellec, Gabriel Peyré, Marco Cuturi

Although optimal transport (OT) problems admit closed form solutions in a very few notable cases, e. g. in 1D or between Gaussians, these closed forms have proved extremely fecund for practitioners to define tools inspired from the OT geometry.

Faster Wasserstein Distance Estimation with the Sinkhorn Divergence

no code implementations NeurIPS 2020 Lenaic Chizat, Pierre Roussillon, Flavien Léger, François-Xavier Vialard, Gabriel Peyré

We also propose and analyze an estimator based on Richardson extrapolation of the Sinkhorn divergence which enjoys improved statistical and computational efficiency guarantees, under a condition on the regularity of the approximation error, which is in particular satisfied for Gaussian densities.

Computational Efficiency

Entropic Optimal Transport between (Unbalanced) Gaussian Measures has a Closed Form

1 code implementation NeurIPS 2020 Hicham Janati, Boris Muzellec, Gabriel Peyré, Marco Cuturi

Although optimal transport (OT) problems admit closed form solutions in a very few notable cases, e. g. in 1D or between Gaussians, these closed forms have proved extremely fecund for practitioners to define tools inspired from the OT geometry.

Statistics Theory Statistics Theory

Online Sinkhorn: Optimal Transport distances from sample streams

no code implementations NeurIPS 2020 Arthur Mensch, Gabriel Peyré

Optimal Transport (OT) distances are now routinely used as loss functions in ML tasks.

Wasserstein Control of Mirror Langevin Monte Carlo

no code implementations11 Feb 2020 Kelvin Shuangjian Zhang, Gabriel Peyré, Jalal Fadili, Marcelo Pereyra

In this paper, we consider Langevin diffusions on a Hessian-type manifold and study a discretization that is closely related to the mirror-descent scheme.

Super-efficiency of automatic differentiation for functions defined as a minimum

no code implementations ICML 2020 Pierre Ablin, Gabriel Peyré, Thomas Moreau

In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm.

Ground Metric Learning on Graphs

1 code implementation8 Nov 2019 Matthieu Heitz, Nicolas Bonneel, David Coeurjolly, Marco Cuturi, Gabriel Peyré

Optimal transport (OT) distances between probability distributions are parameterized by the ground metric they use between observations.

Metric Learning

Degrees of freedom for off-the-grid sparse estimation

no code implementations8 Nov 2019 Clarice Poon, Gabriel Peyré

Our main contribution is a proof of a continuous counterpart to this result for the Blasso.

Super-Resolution

Sinkhorn Divergences for Unbalanced Optimal Transport

4 code implementations28 Oct 2019 Thibault Séjourné, Jean Feydy, François-Xavier Vialard, Alain Trouvé, Gabriel Peyré

Optimal transport induces the Earth Mover's (Wasserstein) distance between probability distributions, a geometric divergence that is relevant to a wide range of problems.

Geometric Losses for Distributional Learning

no code implementations15 May 2019 Arthur Mensch, Mathieu Blondel, Gabriel Peyré

Building upon recent advances in entropy-regularized optimal transport, and upon Fenchel duality between measures and continuous functions , we propose a generalization of the logistic loss that incorporates a metric or cost between classes.

regression

Universal Invariant and Equivariant Graph Neural Networks

1 code implementation NeurIPS 2019 Nicolas Keriven, Gabriel Peyré

In this paper, we consider a specific class of invariant and equivariant networks, for which we prove new universality theorems.

Stochastic Deep Networks

1 code implementation19 Nov 2018 Gwendoline de Bie, Gabriel Peyré, Marco Cuturi

This allows to design discriminative networks (to classify or reduce the dimensionality of input measures), generative architectures (to synthesize measures) and recurrent pipelines (to predict measure dynamics).

Semi-dual Regularized Optimal Transport

no code implementations13 Nov 2018 Marco Cuturi, Gabriel Peyré

Variational problems that involve Wasserstein distances and more generally optimal transport (OT) theory are playing an increasingly important role in data sciences.

Interpolating between Optimal Transport and MMD using Sinkhorn Divergences

1 code implementation18 Oct 2018 Jean Feydy, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouvé, Gabriel Peyré

Comparing probability distributions is a fundamental problem in data sciences.

Statistics Theory Statistics Theory 62

Computational Optimal Transport

5 code implementations1 Mar 2018 Gabriel Peyré, Marco Cuturi

Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746-1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site.

MORPH

Sensitivity Analysis for Mirror-Stratifiable Convex Functions

1 code implementation11 Jul 2017 Jalal Fadili, Jérôme Malick, Gabriel Peyré

This pairing is crucial to track the strata that are identifiable by solutions of parametrized optimization problems or by iterates of optimization algorithms.

GAN and VAE from an Optimal Transport Point of View

no code implementations6 Jun 2017 Aude Genevay, Gabriel Peyré, Marco Cuturi

This short article revisits some of the ideas introduced in arXiv:1701. 07875 and arXiv:1705. 07642 in a simple setup.

Learning Generative Models with Sinkhorn Divergences

2 code implementations1 Jun 2017 Aude Genevay, Gabriel Peyré, Marco Cuturi

The ability to compare two degenerate probability distributions (i. e. two probability distributions supported on two distinct low-dimensional manifolds living in a much higher-dimensional space) is a crucial problem arising in the estimation of generative models for high-dimensional observations such as those arising in computer vision or natural language.

Quantum Optimal Transport for Tensor Field Processing

1 code implementation20 Dec 2016 Gabriel Peyré, Lenaïc Chizat, François-Xavier Vialard, Justin Solomon

This "quantum" formulation of OT (Q-OT) corresponds to a relaxed version of the classical Kantorovich transport problem, where the fidelity between the input PSD-valued measures is captured using the geometry of the Von-Neumann quantum entropy.

Graphics

A Multi-step Inertial Forward-Backward Splitting Method for Non-convex Optimization

no code implementations NeurIPS 2016 Jingwei Liang, Jalal Fadili, Gabriel Peyré

In this paper, we propose a multi-step inertial Forward--Backward splitting algorithm for minimizing the sum of two non-necessarily convex functions, one of which is proper lower semi-continuous while the other is differentiable with a Lipschitz continuous gradient.

BIG-bench Machine Learning

Sparse Support Recovery with Non-smooth Loss Functions

no code implementations NeurIPS 2016 Kévin Degraux, Gabriel Peyré, Jalal Fadili, Laurent Jacques

More precisely, we focus in detail on the cases of $\ell_1$ and $\ell_\infty$ losses, and contrast them with the usual $\ell_2$ loss. While these losses are routinely used to account for either sparse ($\ell_1$ loss) or uniform ($\ell_\infty$ loss) noise models, a theoretical analysis of their performance is still lacking.

Bayesian Modeling of Motion Perception using Dynamical Stochastic Textures

no code implementations2 Nov 2016 Jonathan Vacher, Andrew Isaac Meso, Laurent U. Perrinet, Gabriel Peyré

We use the dynamic texture model to psychophysically probe speed perception in humans using zoom-like changes in the spatial frequency content of the stimulus.

Bayesian Inference Texture Synthesis

Scaling Algorithms for Unbalanced Transport Problems

3 code implementations20 Jul 2016 Lenaic Chizat, Gabriel Peyré, Bernhard Schmitzer, François-Xavier Vialard

This article introduces a new class of fast algorithms to approximate variational problems involving unbalanced optimal transport.

Optimization and Control 65K10

Stochastic Optimization for Large-scale Optimal Transport

no code implementations NeurIPS 2016 Genevay Aude, Marco Cuturi, Gabriel Peyré, Francis Bach

We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS).

Stochastic Optimization

Unbalanced Optimal Transport: Geometry and Kantorovich Formulation

1 code implementation21 Aug 2015 Lenaic Chizat, Gabriel Peyré, Bernhard Schmitzer, François-Xavier Vialard

These distances are defined by two equivalent alternative formulations: (i) a "fluid dynamic" formulation defining the distance as a geodesic distance over the space of measures (ii) a static "Kantorovich" formulation where the distance is the minimum of an optimization program over pairs of couplings describing the transfer (transport, creation and destruction) of mass between two measures.

Optimization and Control

An Interpolating Distance between Optimal Transport and Fisher-Rao

1 code implementation22 Jun 2015 Lenaic Chizat, Bernhard Schmitzer, Gabriel Peyré, François-Xavier Vialard

This metric interpolates between the quadratic Wasserstein and the Fisher-Rao metrics and generalizes optimal transport to measures with different masses.

Analysis of PDEs

Fast Optimal Transport Averaging of Neuroimaging Data

no code implementations30 Mar 2015 Alexandre Gramfort, Gabriel Peyré, Marco Cuturi

Data are large, the geometry of the brain is complex and the between subjects variability leads to spatially or temporally non-overlapping effects of interest.

A Smoothed Dual Approach for Variational Wasserstein Problems

1 code implementation9 Mar 2015 Marco Cuturi, Gabriel Peyré

Variational problems that involve Wasserstein distances have been recently proposed to summarize and learn from probability measures.

Iterative Bregman Projections for Regularized Transportation Problems

1 code implementation16 Dec 2014 Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, Gabriel Peyré

This article details a general numerical framework to approximate so-lutions to linear programs related to optimal transport.

Numerical Analysis Analysis of PDEs

Local Linear Convergence of Forward--Backward under Partial Smoothness

no code implementations NeurIPS 2014 Jingwei Liang, Jalal Fadili, Gabriel Peyré

In this paper, we consider the Forward--Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relatively to an active manifold $\mathcal{M}$.

Low Complexity Regularization of Linear Inverse Problems

no code implementations7 Jul 2014 Samuel Vaiter, Gabriel Peyré, Jalal M. Fadili

Inverse problems and regularization theory is a central theme in contemporary signal processing, where the goal is to reconstruct an unknown signal from partial indirect, and possibly noisy, measurements of it.

Model Consistency of Partly Smooth Regularizers

no code implementations5 May 2014 Samuel Vaiter, Gabriel Peyré, Jalal M. Fadili

We show that a generalized "irrepresentable condition" implies stable model selection under small noise perturbations in the observations and the design matrix, when the regularization parameter is tuned proportionally to the noise level.

Model Selection

Regularized Discrete Optimal Transport

1 code implementation21 Jul 2013 Sira Ferradans, Nicolas Papadakis, Gabriel Peyré, Jean-François Aujol

The resulting transportation plan can be used as a color transfer map, which is robust to mass variation across images color palettes.

Colorization Color Normalization

Cannot find the paper you are looking for? You can Submit a new open access paper.