no code implementations • 30 Jan 2025 • Michael E. Sander, Vincent Roulet, Tianlin Liu, Mathieu Blondel
Energy-based models (EBMs) offer a flexible framework for parameterizing probability distributions using neural networks.
no code implementations • 30 Jan 2025 • Vincent Roulet, Tianlin Liu, Nino Vieillard, Michael E. Sander, Mathieu Blondel
By analogy with the logistic loss, the loss function generated by an $f$-divergence is associated with an operator, that we dub $f$-softargmax.
no code implementations • 8 Jul 2024 • Vincent Roulet, Atish Agarwala, Jean-bastien Grill, Grzegorz Swirszcz, Mathieu Blondel, Fabian Pedregosa
These models break the stabilization of the sharpness, which we explain using a simplified model of the joint dynamics of the learning rate and the curvature.
no code implementations • 23 May 2024 • Seta Rakotomandimby, Jean-Philippe Chancelier, Michel De Lara, Mathieu Blondel
Can we build new loss functions associated with the same link function as Fenchel-Young losses?
1 code implementation • 21 Mar 2024 • Mathieu Blondel, Vincent Roulet
Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming.
no code implementations • 8 Feb 2024 • Pierre Marion, Anna Korba, Peter Bartlett, Mathieu Blondel, Valentin De Bortoli, Arnaud Doucet, Felipe Llinares-López, Courtney Paquette, Quentin Berthet
We present a new algorithm to optimize distributions defined implicitly by parameterized stochastic diffusions.
no code implementations • 8 Feb 2024 • Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré
More precisely, focusing on commuting orthogonal matrices $W$, we first show that a trained one-layer linear Transformer implements one step of gradient descent for the minimization of an inner objective function, when considering augmented tokens.
no code implementations • 7 Feb 2024 • Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi Liu, Misha Khalman, Felipe Llinares, Alexandre Rame, Thomas Mesnard, Yao Zhao, Bilal Piot, Johan Ferret, Mathieu Blondel
Moreover, responses in these datasets are often sampled from a language model distinct from the one being aligned, and since the model evolves over training, the alignment phase is inevitably off-policy.
no code implementations • 5 Feb 2024 • Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel
Aligning language models with human preferences is crucial for reducing errors and biases in these models.
no code implementations • 29 Jan 2024 • Tianlin Liu, Mathieu Blondel, Carlos Riquelme, Joan Puigcerver
Routers for sparse MoEs can be further grouped into two variants: Token Choice, which matches experts to each token, and Expert Choice, which matches tokens to each expert.
no code implementations • 17 Aug 2023 • Vincent Roulet, Mathieu Blondel
Inspired by Gauss-Newton-like methods, we study the benefit of leveraging the structure of deep learning objectives, namely, the composition of a convex loss function and of a nonlinear network, in order to derive better direction oracles than stochastic gradients, based on the idea of partial linearization.
no code implementations • 2 Feb 2023 • Michael E. Sander, Joan Puigcerver, Josip Djolonga, Gabriel Peyré, Mathieu Blondel
In this paper, we propose new differentiable and sparse top-k operators.
no code implementations • 30 Sep 2022 • Tianlin Liu, Joan Puigcerver, Mathieu Blondel
The smoothness of the objectives increases as $k$ increases, giving rise to a trade-off between convergence speed and sparsity of the optimal plan.
no code implementations • 19 May 2022 • Mathieu Blondel, Felipe Llinares-López, Robert Dadashi, Léonard Hussenot, Matthieu Geist
To learn the parameters of the energy function, the solution to that optimization problem is typically fed into a loss function.
no code implementations • 24 Feb 2022 • Robert M. Gower, Mathieu Blondel, Nidham Gazagnadou, Fabian Pedregosa
We use this insight to develop new variants of the SPS method that are better suited to nonlinear models.
2 code implementations • 22 Oct 2021 • Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré
We show that the row-wise stochastic attention matrices in classical Transformers get close to doubly stochastic matrices as the number of epochs increases, justifying the use of Sinkhorn normalization as an informative prior.
1 code implementation • 4 Aug 2021 • André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae
In contrast, for finite domains, recent work on sparse alternatives to softmax (e. g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support.
2 code implementations • NeurIPS 2021 • Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert
In this paper, we propose automatic implicit differentiation, an efficient and modular approach for implicit differentiation of optimization problems.
1 code implementation • 4 May 2021 • Quentin Bertrand, Quentin Klopfenstein, Mathurin Massias, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon
Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques.
no code implementations • 17 Mar 2021 • Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour
Second, we show that inverting permutations is a meaningful pretext task for learning audio representations in an unsupervised fashion.
1 code implementation • 15 Feb 2021 • Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré
We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.
Ranked #139 on
Image Classification
on CIFAR-10
no code implementations • 1 Jan 2021 • Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour
In particular, we also improve music understanding by reordering spectrogram patches in the frequency space, as well as video classification by reordering frames along the time axis.
no code implementations • NeurIPS 2020 • Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach
Machine learning pipelines often rely on optimizers procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).
1 code implementation • 16 Oct 2020 • Mathieu Blondel, Arthur Mensch, Jean-Philippe Vert
Soft-DTW addresses these issues, but it is not a positive definite divergence: due to the bias introduced by entropic regularization, it can be negative and it is not minimized when the time series are equal.
3 code implementations • 20 Feb 2020 • Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach
Machine learning pipelines often rely on optimization procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).
2 code implementations • ICML 2020 • Mathieu Blondel, Olivier Teboul, Quentin Berthet, Josip Djolonga
While numerous works have proposed differentiable proxies to sorting and ranking, they do not achieve the $O(n \log n)$ time complexity one would expect from sorting and ranking operations.
1 code implementation • ICML 2020 • Quentin Bertrand, Quentin Klopfenstein, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon
Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.
1 code implementation • NeurIPS 2019 • Mathieu Blondel
We identify the marginal polytope, the output space's convex hull, as the best convex set on which to project.
no code implementations • 15 May 2019 • Arthur Mensch, Mathieu Blondel, Gabriel Peyré
Building upon recent advances in entropy-regularized optimal transport, and upon Fenchel duality between measures and continuous functions , we propose a generalization of the logistic loss that incorporates a metric or cost between classes.
3 code implementations • 8 Jan 2019 • Mathieu Blondel, André F. T. Martins, Vlad Niculae
Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction.
2 code implementations • 24 May 2018 • Mathieu Blondel, André F. T. Martins, Vlad Niculae
This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function.
no code implementations • 15 Feb 2018 • Antoine Rolet, Vivien Seguy, Mathieu Blondel, Hiroshi Sawada
Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention.
3 code implementations • ICML 2018 • Vlad Niculae, André F. T. Martins, Mathieu Blondel, Claire Cardie
Structured prediction requires searching over a combinatorial number of structures.
no code implementations • ICML 2018 • Arthur Mensch, Mathieu Blondel
We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.
no code implementations • ICLR 2018 • Vivien Seguy, Bharath Bhushan Damodaran, Remi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel
First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions.
2 code implementations • 7 Nov 2017 • Vivien Seguy, Bharath Bhushan Damodaran, Rémi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel
We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures.
1 code implementation • 17 Oct 2017 • Mathieu Blondel, Vivien Seguy, Antoine Rolet
In this paper, we explore regularizing the primal and dual OT formulations with a strongly convex term, which corresponds to relaxing the dual and primal constraints with smooth approximations.
3 code implementations • NeurIPS 2017 • Vlad Niculae, Mathieu Blondel
Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input.
no code implementations • NeurIPS 2017 • Mathieu Blondel, Vlad Niculae, Takuma Otsuka, Naonori Ueda
On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.
9 code implementations • ICML 2017 • Marco Cuturi, Mathieu Blondel
We propose in this paper a differentiable learning loss between time series, building upon the celebrated dynamic time warping (DTW) discrepancy.
no code implementations • 29 Jul 2016 • Mathieu Blondel, Masakazu Ishihata, Akinori Fujino, Naonori Ueda
Polynomial networks and factorization machines are two recently-proposed models that can efficiently use feature interactions in classification and regression tasks.
4 code implementations • NeurIPS 2016 • Mathieu Blondel, Akinori Fujino, Naonori Ueda, Masakazu Ishihata
Factorization machines (FMs) are a supervised learning approach that can use second-order feature combinations even when the data is very high-dimensional.
4 code implementations • 1 Sep 2013 • Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake Vanderplas, Arnaud Joly, Brian Holt, Gaël Varoquaux
Scikit-learn is an increasingly popular machine learning li- brary.
3 code implementations • 2 Jan 2012 • Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay
Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.