no code implementations • 10 Jun 2025 • Waiss Azizian, Michael Kirchhof, Eugene Ndiaye, Louis Bethune, Michal Klein, Pierre Ablin, Marco Cuturi
Large Language Models (LLMs) have demonstrated impressive generalization capabilities across various tasks, but their claim to practical relevance is still mired by concerns on their reliability.
no code implementations • 27 Feb 2025 • Ambroise Heurtebise, Omar Chehab, Pierre Ablin, Alexandre Gramfort, Aapo Hyvärinen
We propose a novel approach to linear causal discovery in the framework of multi-view Structural Equation Models (SEM).
no code implementations • 3 Feb 2025 • Pierre Ablin, Angelos Katharopoulos, Skyler Seto, David Grangier
To train this architecture, we sample random domain weights, instantiate the corresponding model, and backprop through one batch of data sampled with these domain weights.
no code implementations • 30 Jan 2025 • Valérie Castin, Pierre Ablin, José Antonio Carrillo, Gabriel Peyré
This representation is then exploited by the attention function, which learns dependencies between tokens and is key to the success of Transformers.
no code implementations • 13 Jan 2025 • Ambroise Heurtebise, Omar Chehab, Pierre Ablin, Alexandre Gramfort
To address this, we propose Multi-View Independent Component Analysis with Delays and Dilations (MVICAD2), which allows sources to differ across subjects in both temporal delays and dilations.
no code implementations • 8 Oct 2024 • Michael Kirchhof, James Thornton, Pierre Ablin, Louis Béthune, Eugene Ndiaye, Marco Cuturi
We achieve this by adding repellency terms to the diffusion SDE throughout the generation trajectory, which are triggered whenever the path is expected to land too closely to an image in the shielded reference set.
no code implementations • 3 Oct 2024 • Simin Fan, David Grangier, Pierre Ablin
DGA dynamically estimates the pre-training data mixture on which the models' gradients align as well as possible with those of the model on the specific task.
no code implementations • 30 Sep 2024 • David Grangier, Simin Fan, Skyler Seto, Pierre Ablin
In this work, we build specialist models from large generalist training sets instead.
1 code implementation • 6 Sep 2024 • Jason Ramapuram, Federico Danieli, Eeshan Dhekane, Floris Weers, Dan Busbridge, Pierre Ablin, Tatiana Likhomanenko, Jagrit Digani, Zijin Gu, Amitis Shidani, Russ Webb
Attention is a key part of the transformer architecture.
4 code implementations • 5 Sep 2024 • Matteo Pagliardini, Pierre Ablin, David Grangier
This work questions the use of a single EMA to accumulate past gradients and empirically demonstrates how this choice can be sub-optimal: a single EMA cannot simultaneously give a high weight to the immediate past, and a non-negligible weight to older gradients.
1 code implementation • 2 May 2024 • Simon Vary, Pierre Ablin, Bin Gao, P. -A. Absil
Optimization over the set of matrices $X$ that satisfy $X^\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP).
1 code implementation • 26 Feb 2024 • Zhenzhang Ye, Gabriel Peyré, Daniel Cremers, Pierre Ablin
As a function of the error of the inner problem resolution, we study the error of the IFT method.
no code implementations • 5 Feb 2024 • Yu-Guan Hsieh, James Thornton, Eugene Ndiaye, Michal Klein, Marco Cuturi, Pierre Ablin
Beyond minimizing a single training loss, many deep learning estimation pipelines rely on an auxiliary objective to quantify and encourage desirable properties of the model (e. g. performance on another dataset, robustness, agreement with a prior).
no code implementations • 2 Feb 2024 • David Grangier, Angelos Katharopoulos, Pierre Ablin, Awni Hannun
In the first scenario, we propose an effective solution based on importance sampling: we resample the pretraining set to imitate the specialization data and train a small model on it.
Ranked #1 on
Language Modelling
on The Pile
(Test perplexity metric)
no code implementations • 22 Dec 2023 • Valérie Castin, Pierre Ablin, Gabriel Peyré
When the sequence length $n$ is too large for the previous bound to be tight, which we refer to as the mean-field regime, we provide an upper bound and a matching lower bound which are independent of $n$.
no code implementations • 1 Dec 2023 • Ambroise Heurtebise, Pierre Ablin, Alexandre Gramfort
Linear Independent Component Analysis (ICA) is a blind source separation technique that has been used in various domains to identify independent latent sources from observed signals.
no code implementations • 20 Nov 2023 • David Grangier, Pierre Ablin, Awni Hannun
Large neural networks pretrained on web-scale corpora are central to modern machine learning.
no code implementations • 26 Oct 2023 • Anastasia Ivanova, Pierre Ablin
In many scenarios, one uses a large training set to train a model with the goal of performing well on a smaller testing set with a different distribution.
no code implementations • 20 Jun 2023 • Michal Klein, Aram-Alexandre Pooladian, Pierre Ablin, Eugène Ndiaye, Jonathan Niles-Weed, Marco Cuturi
Given a source and a target probability measure supported on $\mathbb{R}^d$, the Monge problem asks to find the most efficient way to map one distribution to the other.
no code implementations • 24 May 2023 • Zaccharie Ramzi, Pierre Ablin, Gabriel Peyré, Thomas Moreau
Implicit deep learning has recently gained popularity with applications ranging from meta-learning to Deep Equilibrium Networks (DEQs).
1 code implementation • 29 Mar 2023 • Pierre Ablin, Simon Vary, Bin Gao, P. -A. Absil
Orthogonality constraints naturally appear in many machine learning problems, from principal component analysis to robust neural network training.
no code implementations • 17 Feb 2023 • Mathieu Dagréou, Thomas Moreau, Samuel Vaiter, Pierre Ablin
Bilevel optimization problems, which are problems where two optimization problems are nested, have more and more applications in machine learning.
no code implementations • 8 Feb 2023 • Marco Cuturi, Michal Klein, Pierre Ablin
Optimal transport (OT) theory focuses, among all maps $T:\mathbb{R}^d\rightarrow \mathbb{R}^d$ that can morph a probability measure onto another, on those that are the ``thriftiest'', i. e. such that the averaged cost $c(x, T(x))$ between $x$ and its image $T(x)$ be as small as possible.
3 code implementations • 27 Jun 2022 • Thomas Moreau, Mathurin Massias, Alexandre Gramfort, Pierre Ablin, Pierre-Antoine Bannier, Benjamin Charlier, Mathieu Dagréou, Tom Dupré La Tour, Ghislain Durif, Cassio F. Dantas, Quentin Klopfenstein, Johan Larsson, En Lai, Tanguy Lefort, Benoit Malézieux, Badr Moufad, Binh T. Nguyen, Alain Rakotomamonjy, Zaccharie Ramzi, Joseph Salmon, Samuel Vaiter
Numerical validation is at the core of machine learning research as it allows to assess the actual impact of new methods, and to confirm the agreement between theory and practice.
no code implementations • 29 May 2022 • Michael E. Sander, Pierre Ablin, Gabriel Peyré
As a byproduct of our analysis, we consider the use of a memory-free discrete adjoint method to train a ResNet by recovering the activations on the fly through a backward pass of the network, and show that this method theoretically succeeds at large depth if the residual functions are Lipschitz with the input.
1 code implementation • 31 Jan 2022 • Mathieu Dagréou, Pierre Ablin, Samuel Vaiter, Thomas Moreau
However, computing the gradient of the value function involves solving a linear system, which makes it difficult to derive unbiased stochastic estimates.
1 code implementation • NeurIPS 2021 • Hugo Richard, Pierre Ablin, Bertrand Thirion, Alexandre Gramfort, Aapo Hyvärinen
While ShICA-J is based on second-order statistics, we further propose to leverage non-Gaussianity of the components using a maximum-likelihood method, ShICA-ML, that is both more accurate and more costly.
2 code implementations • 22 Oct 2021 • Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré
We show that the row-wise stochastic attention matrices in classical Transformers get close to doubly stochastic matrices as the number of epochs increases, justifying the use of Sinkhorn normalization as an informative prior.
2 code implementations • 20 May 2021 • Anna Korba, Pierre-Cyril Aubin-Frankowski, Szymon Majewski, Pierre Ablin
We investigate the properties of its Wasserstein gradient flow to approximate a target probability distribution $\pi$ on $\mathbb{R}^d$, known up to a normalization constant.
no code implementations • 22 Feb 2021 • Hugo Richard, Pierre Ablin, Aapo Hyvärinen, Alexandre Gramfort, Bertrand Thirion
By contrast, we propose Adaptive multiView ICA (AVICA), a noisy ICA model where each view is a linear mixture of shared independent sources with additive noise on the sources.
1 code implementation • 15 Feb 2021 • Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré
We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.
Ranked #140 on
Image Classification
on CIFAR-10
1 code implementation • 15 Feb 2021 • Pierre Ablin, Gabriel Peyré
We consider the problem of minimizing a function over the manifold of orthogonal matrices.
no code implementations • 27 Nov 2020 • Pierre Ablin
We consider the problem of training a deep orthogonal linear network, which consists of a product of orthogonal matrices, with no non-linearity in-between.
no code implementations • 21 Aug 2020 • Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort
Signals are modelled as a linear mixing of independent sources corrupted by additive noise, where sources and the noise are stationary Gaussian time series.
1 code implementation • NeurIPS 2020 • Hugo Richard, Luigi Gresele, Aapo Hyvärinen, Bertrand Thirion, Alexandre Gramfort, Pierre Ablin
Group studies involving large cohorts of subjects are important to draw general conclusions about brain functional organization.
no code implementations • 25 May 2020 • Ronan Perry, Gavin Mischler, Richard Guo, Theodore Lee, Alexander Chang, Arman Koul, Cameron Franz, Hugo Richard, Iain Carmichael, Pierre Ablin, Alexandre Gramfort, Joshua T. Vogelstein
As data are generated more and more from multiple disparate sources, multiview data sets, where each sample has features in distinct views, have ballooned in recent years.
no code implementations • ICML 2020 • Pierre Ablin, Gabriel Peyré, Thomas Moreau
In most cases, the minimum has no closed-form, and an approximation is obtained via an iterative algorithm.
1 code implementation • NeurIPS 2019 • David Sabbagh, Pierre Ablin, Gael Varoquaux, Alexandre Gramfort, Denis A. Engemann
We show that Wasserstein and geometric distances allow perfect out-of-sample prediction on the generative models.
1 code implementation • NeurIPS 2019 • Pierre Ablin, Thomas Moreau, Mathurin Massias, Alexandre Gramfort
We demonstrate that for a large class of unfolded algorithms, if the algorithm converges to the solution of the Lasso, its last layers correspond to ISTA with learned step sizes.
1 code implementation • 28 Nov 2018 • Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort
The approximate joint diagonalization of a set of matrices consists in finding a basis in which these matrices are as diagonal as possible.
1 code implementation • 6 Nov 2018 • Pierre Ablin, Dylan Fagot, Herwig Wendt, Alexandre Gramfort, Cédric Févotte
Nonnegative matrix factorization (NMF) is a popular method for audio spectral unmixing.
no code implementations • 25 Jun 2018 • Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort
We study optimization methods for solving the maximum likelihood formulation of independent component analysis (ICA).
1 code implementation • 25 May 2018 • Pierre Ablin, Alexandre Gramfort, Jean-François Cardoso, Francis Bach
We derive an online algorithm for the streaming setting, and an incremental algorithm for the finite sum setting, with the following benefits.
1 code implementation • 29 Nov 2017 • Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort
Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data widely used in observational sciences.
2 code implementations • 25 Jun 2017 • Pierre Ablin, Jean-François Cardoso, Alexandre Gramfort
Independent Component Analysis (ICA) is a technique for unsupervised exploration of multi-channel data that is widely used in observational sciences.