Search Results for author: Pierre Ablin

Found 35 papers, 16 papers with code

Optimization without retraction on the random generalized Stiefel manifold

no code implementations2 May 2024 Simon Vary, Pierre Ablin, Bin Gao, P. -A. Absil

Optimization over the set of matrices that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP).

Careful with that Scalpel: Improving Gradient Surgery with an EMA

no code implementations5 Feb 2024 Yu-Guan Hsieh, James Thornton, Eugene Ndiaye, Michal Klein, Marco Cuturi, Pierre Ablin

Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e. g. performance on another dataset, robustness, agreement with a prior).

Specialized Language Models with Cheap Inference from Limited Domain Data

no code implementations2 Feb 2024 David Grangier, Angelos Katharopoulos, Pierre Ablin, Awni Hannun

Large language models have emerged as a versatile tool but are challenging to apply to tasks lacking large inference budgets and large in-domain training sets.

 Ranked #1 on Language Modelling on The Pile (Test perplexity metric)

Language Modelling

Understanding the Regularity of Self-Attention with Optimal Transport

no code implementations22 Dec 2023 Valérie Castin, Pierre Ablin, Gabriel Peyré

This allows us to generalize attention to inputs of infinite length, and to derive an upper bound and a lower bound on the Lipschitz constant of self-attention on compact sets.

MultiView Independent Component Analysis with Delays

no code implementations1 Dec 2023 Ambroise Heurtebise, Pierre Ablin, Alexandre Gramfort

Linear Independent Component Analysis (ICA) is a blind source separation technique that has been used in various domains to identify independent latent sources from observed signals.

blind source separation

A Challenge in Reweighting Data with Bilevel Optimization

no code implementations26 Oct 2023 Anastasia Ivanova, Pierre Ablin

In many scenarios, one uses a large training set to train a model with the goal of performing well on a smaller testing set with a different distribution.

Bilevel Optimization

Learning Elastic Costs to Shape Monge Displacements

no code implementations20 Jun 2023 Michal Klein, Aram-Alexandre Pooladian, Pierre Ablin, Eugène Ndiaye, Jonathan Niles-Weed, Marco Cuturi

Given a source and a target probability measure supported on $\mathbb{R}^d$, the Monge problem asks to find the most efficient way to map one distribution to the other.

Test like you Train in Implicit Deep Learning

no code implementations24 May 2023 Zaccharie Ramzi, Pierre Ablin, Gabriel Peyré, Thomas Moreau

Implicit deep learning has recently gained popularity with applications ranging from meta-learning to Deep Equilibrium Networks (DEQs).


Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints

no code implementations29 Mar 2023 Pierre Ablin, Simon Vary, Bin Gao, P. -A. Absil

Finally, our experiments demonstrate the promise of our approach to an array of machine-learning problems that involve orthogonality constraints.

Riemannian optimization

A Lower Bound and a Near-Optimal Algorithm for Bilevel Empirical Risk Minimization

no code implementations17 Feb 2023 Mathieu Dagréou, Thomas Moreau, Samuel Vaiter, Pierre Ablin

Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning.

Bilevel Optimization

Monge, Bregman and Occam: Interpretable Optimal Transport in High-Dimensions with Feature-Sparse Maps

no code implementations8 Feb 2023 Marco Cuturi, Michal Klein, Pierre Ablin

Optimal transport (OT) theory focuses, among all maps $T:\mathbb{R}^d\rightarrow \mathbb{R}^d$ that can morph a probability measure onto another, on those that are the ``thriftiest'', i. e. such that the averaged cost $c(x, T(x))$ between $x$ and its image $T(x)$ be as small as possible.

Dimensionality Reduction MORPH

Do Residual Neural Networks discretize Neural Ordinary Differential Equations?

no code implementations29 May 2022 Michael E. Sander, Pierre Ablin, Gabriel Peyré

As a byproduct of our analysis, we consider the use of a memory-free discrete adjoint method to train a ResNet by recovering the activations on the fly through a backward pass of the network, and show that this method theoretically succeeds at large depth if the residual functions are Lipschitz with the input.

A framework for bilevel optimization that enables stochastic and global variance reduction algorithms

1 code implementation31 Jan 2022 Mathieu Dagréou, Pierre Ablin, Samuel Vaiter, Thomas Moreau

However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates.

Bilevel Optimization

Shared Independent Component Analysis for Multi-Subject Neuroimaging

1 code implementation NeurIPS 2021 Hugo Richard, Pierre Ablin, Bertrand Thirion, Alexandre Gramfort, Aapo Hyvärinen

While ShICA-J is based on second-order statistics, we further propose to leverage non-Gaussianity of the components using a maximum-likelihood method, ShICA-ML, that is both more accurate and more costly.


Sinkformers: Transformers with Doubly Stochastic Attention

1 code implementation22 Oct 2021 Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

We show that the row-wise stochastic attention matrices in classical Transformers get close to doubly stochastic matrices as the number of epochs increases, justifying the use of Sinkhorn normalization as an informative prior.

Image Classification

Kernel Stein Discrepancy Descent

2 code implementations20 May 2021 Anna Korba, Pierre-Cyril Aubin-Frankowski, Szymon Majewski, Pierre Ablin

We investigate the properties of its Wasserstein gradient flow to approximate a target probability distribution $\pi$ on $\mathbb{R}^d$, known up to a normalization constant.

Adaptive Multi-View ICA: Estimation of noise levels for optimal inference

no code implementations22 Feb 2021 Hugo Richard, Pierre Ablin, Aapo Hyvärinen, Alexandre Gramfort, Bertrand Thirion

By contrast, we propose Adaptive multiView ICA (AVICA), a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources.


Fast and accurate optimization on the orthogonal manifold without retraction

1 code implementation15 Feb 2021 Pierre Ablin, Gabriel Peyré

We consider the problem of minimizing a function over the manifold of orthogonal matrices.

Momentum Residual Neural Networks

1 code implementation15 Feb 2021 Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.

Image Classification

Deep orthogonal linear networks are shallow

no code implementations27 Nov 2020 Pierre Ablin

We consider the problem of training a deep orthogonal linear network, which consists of a product of orthogonal matrices, with no non-linearity in-between.

Spectral independent component analysis with noise modeling for M/EEG source separation

no code implementations21 Aug 2020 Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

Signals are modelled as a linear mixing of independent sources corrupted by additive noise, where sources and the noise are stationary Gaussian time series.

Denoising Dimensionality Reduction +2

Modeling Shared Responses in Neuroimaging Studies through MultiView ICA

1 code implementation NeurIPS 2020 Hugo Richard, Luigi Gresele, Aapo Hyvärinen, Bertrand Thirion, Alexandre Gramfort, Pierre Ablin

Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.


mvlearn: Multiview Machine Learning in Python

no code implementations25 May 2020 Ronan Perry, Gavin Mischler, Richard Guo, Theodore Lee, Alexander Chang, Arman Koul, Cameron Franz, Hugo Richard, Iain Carmichael, Pierre Ablin, Alexandre Gramfort, Joshua T. Vogelstein

As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have ballooned in recent years.

BIG-bench Machine Learning

Super-efficiency of automatic differentiation for functions defined as a minimum

no code implementations ICML 2020 Pierre Ablin, Gabriel Peyré, Thomas Moreau

In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm.

Learning step sizes for unfolded sparse coding

1 code implementation NeurIPS 2019 Pierre Ablin, Thomas Moreau, Mathurin Massias, Alexandre Gramfort

We demonstrate that for a large class of unfolded algorithms, if the algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes.

Beyond Pham's algorithm for joint diagonalization

1 code implementation28 Nov 2018 Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

The approximate joint diagonalization of a set of matrices consists in finding a basis in which these matrices are as diagonal as possible.

Accelerating likelihood optimization for ICA on real signals

no code implementations25 Jun 2018 Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

We study optimization methods for solving the maximum likelihood formulation of independent component analysis (ICA).

Stochastic algorithms with descent guarantees for ICA

1 code implementation25 May 2018 Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso, Francis Bach

We derive an online algorithm for the streaming setting, and an incremental algorithm for the finite sum setting, with the following benefits.

Faster ICA under orthogonal constraint

1 code implementation29 Nov 2017 Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data widely used in observational sciences.

Faster independent component analysis by preconditioning with Hessian approximations

2 code implementations25 Jun 2017 Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort

Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data that is widely used in observational sciences.

Cannot find the paper you are looking for? You can Submit a new open access paper.