no code implementations • 14 Mar 2025 • Samuel Hurault, Matthieu Terris, Thomas Moreau, Gabriel Peyré
This allows us to rigorously track how the anisotropy of the data distribution (encoded by its power spectrum) interacts with key parameters of the end-to-end sampling method, including the noise amplitude, the step sizes in both score matching and diffusion, and the number of initial samples.
no code implementations • 30 Jan 2025 • Valérie Castin, Pierre Ablin, José Antonio Carrillo, Gabriel Peyré
This representation is then exploited by the attention function, which learns dependencies between tokens and is key to the success of Transformers.
no code implementations • 15 Jan 2025 • Gabriel Peyré
This overview article highlights the critical role of mathematics in artificial intelligence (AI), emphasizing that mathematics provides tools to better understand and enhance AI systems.
no code implementations • 3 Oct 2024 • Michael E. Sander, Gabriel Peyré
Causal Transformers are trained to predict the next token for a given context.
no code implementations • 2 Aug 2024 • Takashi Furuya, Maarten V. de Hoop, Gabriel Peyré
A key aspect of our results, compared to existing findings, is that for a fixed precision, a single transformer can operate on an arbitrary (even infinite) number of tokens.
1 code implementation • 21 May 2024 • Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré
Conservation laws are well-established in the context of Euclidean gradient flow dynamics, notably for linear or ReLU neural network training.
no code implementations • 19 Mar 2024 • Raphaël Barboni, Gabriel Peyré, François-Xavier Vialard
We study the convergence of gradient flow for the training of deep neural networks.
1 code implementation • 26 Feb 2024 • Zhenzhang Ye, Gabriel Peyré, Daniel Cremers, Pierre Ablin
As a function of the error of the inner problem resolution, we study the error of the IFT method.
no code implementations • 8 Feb 2024 • Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré
More precisely, focusing on commuting orthogonal matrices $W$, we first show that a trained one-layer linear Transformer implements one step of gradient descent for the minimization of an inner objective function, when considering augmented tokens.
no code implementations • 22 Dec 2023 • Valérie Castin, Pierre Ablin, Gabriel Peyré
When the sequence length $n$ is too large for the previous bound to be tight, which we refer to as the mean-field regime, we provide an upper bound and a matching lower bound which are independent of $n$.
no code implementations • 9 Nov 2023 • Othmane Sebbouh, Marco Cuturi, Gabriel Peyré
Matching a source to a target probability measure is often solved by instantiating a linear optimal transport (OT) problem, parameterized by a ground cost function that quantifies discrepancy between points.
no code implementations • NeurIPS 2023 • Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré
Besides, applying the two algorithms confirms for a number of ReLU network architectures that all known laws are recovered by the algorithm, and that there are no other independent laws.
no code implementations • 24 May 2023 • Zaccharie Ramzi, Pierre Ablin, Gabriel Peyré, Thomas Moreau
Implicit deep learning has recently gained popularity with applications ranging from meta-learning to Deep Equilibrium Networks (DEQs).
no code implementations • 2 Feb 2023 • Michael E. Sander, Joan Puigcerver, Josip Djolonga, Gabriel Peyré, Mathieu Blondel
In this paper, we propose new differentiable and sparse top-k operators.
no code implementations • 16 Nov 2022 • Thibault Séjourné, Gabriel Peyré, François-Xavier Vialard
Optimal Transport (OT) has recently emerged as a central tool in data sciences to compare in a geometrically faithful way point clouds and more generally probability distributions.
no code implementations • 29 May 2022 • Michael E. Sander, Pierre Ablin, Gabriel Peyré
As a byproduct of our analysis, we consider the use of a memory-free discrete adjoint method to train a ResNet by recovering the activations on the fly through a backward pass of the network, and show that this method theoretically succeeds at large depth if the residual functions are Lipschitz with the input.
no code implementations • 3 May 2022 • Clarice Poon, Gabriel Peyré
Our main theoretical contribution connects gradient descent on this reformulation to a mirror descent flow with a varying Hessian metric.
no code implementations • 3 Jan 2022 • Thibault Séjourné, François-Xavier Vialard, Gabriel Peyré
In this work, we identify the cause for this deficiency, namely the lack of a global normalization of the iterates, which equivalently corresponds to a translation of the dual OT potentials.
1 code implementation • 10 Dec 2021 • Raphaël Barboni, Gabriel Peyré, François-Xavier Vialard
To bridge the gap between the lazy and mean field regimes, we study Residual Networks (ResNets) in which the residual block has linear parametrization while still being nonlinear.
no code implementations • 25 Nov 2021 • Othmane Sebbouh, Marco Cuturi, Gabriel Peyré
RSGDA can be parameterized using optimal loop sizes that guarantee the best convergence rates known to hold for SGDA.
1 code implementation • 22 Oct 2021 • Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré
We show that the row-wise stochastic attention matrices in classical Transformers get close to doubly stochastic matrices as the number of epochs increases, justifying the use of Sinkhorn normalization as an informative prior.
1 code implementation • NeurIPS 2021 • Clarice Poon, Gabriel Peyré
Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning.
1 code implementation • NeurIPS 2021 • Meyer Scetbon, Gabriel Peyré, Marco Cuturi
The ability to align points across two related yet incomparable point clouds (e. g. living in different spaces) plays an important role in machine learning.
1 code implementation • 8 Mar 2021 • Meyer Scetbon, Marco Cuturi, Gabriel Peyré
Because matrix-vector products are pervasive in the Sinkhorn algorithm, several works have proposed to \textit{approximate} kernel matrices appearing in its iterations using low-rank factors.
1 code implementation • 15 Feb 2021 • Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré
We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.
Ranked #139 on
Image Classification
on CIFAR-10
1 code implementation • 15 Feb 2021 • Pierre Ablin, Gabriel Peyré
We consider the problem of minimizing a function over the manifold of orthogonal matrices.
1 code implementation • 11 Feb 2021 • Geert-Jan Huizing, Laura Cantini, Gabriel Peyré
Optimal Transport (OT) lifts a distance between features (the "ground metric") to a geometrically meaningful distance between samples.
no code implementations • NeurIPS 2020 • Hicham Janati, Boris Muzellec, Gabriel Peyré, Marco Cuturi
Although optimal transport (OT) problems admit closed form solutions in a very few notable cases, e. g. in 1D or between Gaussians, these closed forms have proved extremely fecund for practitioners to define tools inspired from the OT geometry.
2 code implementations • NeurIPS 2021 • Thibault Séjourné, François-Xavier Vialard, Gabriel Peyré
The GW distance is however limited to the comparison of metric measure spaces endowed with a probability distribution.
no code implementations • 24 Jun 2020 • Gwendoline de Bie, Herilalaina Rakotoarison, Gabriel Peyré, Michèle Sebag
On both tasks, Dida learns meta-features supporting the characterization of a (labelled) dataset.
no code implementations • NeurIPS 2020 • Lenaic Chizat, Pierre Roussillon, Flavien Léger, François-Xavier Vialard, Gabriel Peyré
We also propose and analyze an estimator based on Richardson extrapolation of the Sinkhorn divergence which enjoys improved statistical and computational efficiency guarantees, under a condition on the regularity of the approximation error, which is in particular satisfied for Gaussian densities.
1 code implementation • NeurIPS 2020 • Hicham Janati, Boris Muzellec, Gabriel Peyré, Marco Cuturi
Although optimal transport (OT) problems admit closed form solutions in a very few notable cases, e. g. in 1D or between Gaussians, these closed forms have proved extremely fecund for practitioners to define tools inspired from the OT geometry.
Statistics Theory Statistics Theory
no code implementations • NeurIPS 2020 • Arthur Mensch, Gabriel Peyré
Optimal Transport (OT) distances are now routinely used as loss functions in ML tasks.
no code implementations • 11 Feb 2020 • Kelvin Shuangjian Zhang, Gabriel Peyré, Jalal Fadili, Marcelo Pereyra
In this paper, we consider Langevin diffusions on a Hessian-type manifold and study a discretization that is closely related to the mirror-descent scheme.
no code implementations • ICML 2020 • Pierre Ablin, Gabriel Peyré, Thomas Moreau
In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm.
no code implementations • 8 Nov 2019 • Clarice Poon, Gabriel Peyré
Our main contribution is a proof of a continuous counterpart to this result for the Blasso.
1 code implementation • 8 Nov 2019 • Matthieu Heitz, Nicolas Bonneel, David Coeurjolly, Marco Cuturi, Gabriel Peyré
Optimal transport (OT) distances between probability distributions are parameterized by the ground metric they use between observations.
4 code implementations • 28 Oct 2019 • Thibault Séjourné, Jean Feydy, François-Xavier Vialard, Alain Trouvé, Gabriel Peyré
Optimal transport induces the Earth Mover's (Wasserstein) distance between probability distributions, a geometric divergence that is relevant to a wide range of problems.
no code implementations • 15 May 2019 • Arthur Mensch, Mathieu Blondel, Gabriel Peyré
Building upon recent advances in entropy-regularized optimal transport, and upon Fenchel duality between measures and continuous functions , we propose a generalization of the logistic loss that incorporates a metric or cost between classes.
1 code implementation • NeurIPS 2019 • Nicolas Keriven, Gabriel Peyré
In this paper, we consider a specific class of invariant and equivariant networks, for which we prove new universality theorems.
1 code implementation • 19 Nov 2018 • Gwendoline de Bie, Gabriel Peyré, Marco Cuturi
This allows to design discriminative networks (to classify or reduce the dimensionality of input measures), generative architectures (to synthesize measures) and recurrent pipelines (to predict measure dynamics).
no code implementations • 13 Nov 2018 • Marco Cuturi, Gabriel Peyré
Variational problems that involve Wasserstein distances and more generally optimal transport (OT) theory are playing an increasingly important role in data sciences.
1 code implementation • 18 Oct 2018 • Jean Feydy, Thibault Séjourné, François-Xavier Vialard, Shun-ichi Amari, Alain Trouvé, Gabriel Peyré
Comparing probability distributions is a fundamental problem in data sciences.
Statistics Theory Statistics Theory 62
5 code implementations • 1 Mar 2018 • Gabriel Peyré, Marco Cuturi
Optimal transport (OT) theory can be informally described using the words of the French mathematician Gaspard Monge (1746-1818): A worker with a shovel in hand has to move a large pile of sand lying on a construction site.
2 code implementations • 7 Aug 2017 • Morgan A. Schmitz, Matthieu Heitz, Nicolas Bonneel, Fred Maurice Ngolè Mboula, David Coeurjolly, Marco Cuturi, Gabriel Peyré, Jean-Luc Starck
Wasserstein barycenters) between dictionary atoms; such atoms are themselves synthetic histograms in the probability simplex.
1 code implementation • 11 Jul 2017 • Jalal Fadili, Jérôme Malick, Gabriel Peyré
This pairing is crucial to track the strata that are identifiable by solutions of parametrized optimization problems or by iterates of optimization algorithms.
no code implementations • 6 Jun 2017 • Aude Genevay, Gabriel Peyré, Marco Cuturi
This short article revisits some of the ideas introduced in arXiv:1701. 07875 and arXiv:1705. 07642 in a simple setup.
2 code implementations • 1 Jun 2017 • Aude Genevay, Gabriel Peyré, Marco Cuturi
The ability to compare two degenerate probability distributions (i. e. two probability distributions supported on two distinct low-dimensional manifolds living in a much higher-dimensional space) is a crucial problem arising in the estimation of generative models for high-dimensional observations such as those arising in computer vision or natural language.
1 code implementation • 20 Dec 2016 • Gabriel Peyré, Lenaïc Chizat, François-Xavier Vialard, Justin Solomon
This "quantum" formulation of OT (Q-OT) corresponds to a relaxed version of the classical Kantorovich transport problem, where the fidelity between the input PSD-valued measures is captured using the geometry of the Von-Neumann quantum entropy.
Graphics
no code implementations • NeurIPS 2016 • Kévin Degraux, Gabriel Peyré, Jalal Fadili, Laurent Jacques
More precisely, we focus in detail on the cases of $\ell_1$ and $\ell_\infty$ losses, and contrast them with the usual $\ell_2$ loss. While these losses are routinely used to account for either sparse ($\ell_1$ loss) or uniform ($\ell_\infty$ loss) noise models, a theoretical analysis of their performance is still lacking.
no code implementations • NeurIPS 2016 • Jingwei Liang, Jalal Fadili, Gabriel Peyré
In this paper, we propose a multi-step inertial Forward--Backward splitting algorithm for minimizing the sum of two non-necessarily convex functions, one of which is proper lower semi-continuous while the other is differentiable with a Lipschitz continuous gradient.
no code implementations • 2 Nov 2016 • Jonathan Vacher, Andrew Isaac Meso, Laurent U. Perrinet, Gabriel Peyré
We use the dynamic texture model to psychophysically probe speed perception in humans using zoom-like changes in the spatial frequency content of the stimulus.
3 code implementations • 20 Jul 2016 • Lenaic Chizat, Gabriel Peyré, Bernhard Schmitzer, François-Xavier Vialard
This article introduces a new class of fast algorithms to approximate variational problems involving unbalanced optimal transport.
Optimization and Control 65K10
no code implementations • NeurIPS 2016 • Genevay Aude, Marco Cuturi, Gabriel Peyré, Francis Bach
We instantiate these ideas in three different setups: (i) when comparing a discrete distribution to another, we show that incremental stochastic optimization schemes can beat Sinkhorn's algorithm, the current state-of-the-art finite dimensional OT solver; (ii) when comparing a discrete distribution to a continuous density, a semi-discrete reformulation of the dual program is amenable to averaged stochastic gradient descent, leading to better performance than approximately solving the problem by discretization ; (iii) when dealing with two continuous densities, we propose a stochastic gradient descent over a reproducing kernel Hilbert space (RKHS).
no code implementations • NeurIPS 2015 • Jonathan Vacher, Andrew Meso, Laurent U. Perrinet, Gabriel Peyré
We study here the principled construction of a generative model specifically crafted to probe motion perception.
1 code implementation • 21 Aug 2015 • Lenaic Chizat, Gabriel Peyré, Bernhard Schmitzer, François-Xavier Vialard
These distances are defined by two equivalent alternative formulations: (i) a "fluid dynamic" formulation defining the distance as a geodesic distance over the space of measures (ii) a static "Kantorovich" formulation where the distance is the minimum of an optimization program over pairs of couplings describing the transfer (transport, creation and destruction) of mass between two measures.
Optimization and Control
1 code implementation • 22 Jun 2015 • Lenaic Chizat, Bernhard Schmitzer, Gabriel Peyré, François-Xavier Vialard
This metric interpolates between the quadratic Wasserstein and the Fisher-Rao metrics and generalizes optimal transport to measures with different masses.
Analysis of PDEs
no code implementations • 30 Mar 2015 • Alexandre Gramfort, Gabriel Peyré, Marco Cuturi
Data are large, the geometry of the brain is complex and the between subjects variability leads to spatially or temporally non-overlapping effects of interest.
1 code implementation • 9 Mar 2015 • Marco Cuturi, Gabriel Peyré
Variational problems that involve Wasserstein distances have been recently proposed to summarize and learn from probability measures.
1 code implementation • 16 Dec 2014 • Jean-David Benamou, Guillaume Carlier, Marco Cuturi, Luca Nenna, Gabriel Peyré
This article details a general numerical framework to approximate so-lutions to linear programs related to optimal transport.
Numerical Analysis Analysis of PDEs
no code implementations • NeurIPS 2014 • Jingwei Liang, Jalal Fadili, Gabriel Peyré
In this paper, we consider the Forward--Backward proximal splitting algorithm to minimize the sum of two proper closed convex functions, one of which having a Lipschitz continuous gradient and the other being partly smooth relatively to an active manifold $\mathcal{M}$.
no code implementations • 7 Jul 2014 • Samuel Vaiter, Gabriel Peyré, Jalal M. Fadili
Inverse problems and regularization theory is a central theme in contemporary signal processing, where the goal is to reconstruct an unknown signal from partial indirect, and possibly noisy, measurements of it.
no code implementations • 5 May 2014 • Samuel Vaiter, Gabriel Peyré, Jalal M. Fadili
We show that a generalized "irrepresentable condition" implies stable model selection under small noise perturbations in the observations and the design matrix, when the regularization parameter is tuned proportionally to the noise level.
1 code implementation • 21 Jul 2013 • Sira Ferradans, Nicolas Papadakis, Gabriel Peyré, Jean-François Aujol
The resulting transportation plan can be used as a color transfer map, which is robust to mass variation across images color palettes.