Search Results for author: Mathieu Blondel

Found 40 papers, 21 papers with code

The Elements of Differentiable Programming

1 code implementation • 21 Mar 2024 • Mathieu Blondel, Vincent Roulet

Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming.

1,077

Paper
Code

Implicit Diffusion: Efficient Optimization through Stochastic Sampling

no code implementations • 8 Feb 2024 • Pierre Marion, Anna Korba, Peter Bartlett, Mathieu Blondel, Valentin De Bortoli, Arnaud Doucet, Felipe Llinares-López, Courtney Paquette, Quentin Berthet

We present a new algorithm to optimize distributions defined implicitly by parameterized stochastic diffusions.

Bilevel Optimization

Paper
Add Code

How do Transformers perform In-Context Autoregressive Learning?

no code implementations • 8 Feb 2024 • Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré

More precisely, focusing on commuting orthogonal matrices $W$, we first show that a trained one-layer linear Transformer implements one step of gradient descent for the minimization of an inner objective function, when considering augmented tokens.

Language Modelling

Paper
Add Code

Direct Language Model Alignment from Online AI Feedback

no code implementations • 7 Feb 2024 • Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi Liu, Misha Khalman, Felipe Llinares, Alexandre Rame, Thomas Mesnard, Yao Zhao, Bilal Piot, Johan Ferret, Mathieu Blondel

Moreover, responses in these datasets are often sampled from a language model distinct from the one being aligned, and since the model evolves over training, the alignment phase is inevitably off-policy.

Language Modelling

Paper
Add Code

Decoding-time Realignment of Language Models

no code implementations • 5 Feb 2024 • Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel

Aligning language models with human preferences is crucial for reducing errors and biases in these models.

Models Alignment

Paper
Add Code

Routers in Vision Mixture of Experts: An Empirical Study

no code implementations • 29 Jan 2024 • Tianlin Liu, Mathieu Blondel, Carlos Riquelme, Joan Puigcerver

Routers for sparse MoEs can be further grouped into two variants: Token Choice, which matches experts to each token, and Expert Choice, which matches tokens to each expert.

Language Modelling

Paper
Add Code

Dual Gauss-Newton Directions for Deep Learning

no code implementations • 17 Aug 2023 • Vincent Roulet, Mathieu Blondel

Inspired by Gauss-Newton-like methods, we study the benefit of leveraging the structure of deep learning objectives, namely, the composition of a convex loss function and of a nonlinear network, in order to derive better direction oracles than stochastic gradients, based on the idea of partial linearization.

Paper
Add Code

Fast, Differentiable and Sparse Top-k: a Convex Analysis Perspective

no code implementations • 2 Feb 2023 • Michael E. Sander, Joan Puigcerver, Josip Djolonga, Gabriel Peyré, Mathieu Blondel

In this paper, we propose new differentiable and sparse top-k operators.

Paper
Add Code

Sparsity-Constrained Optimal Transport

no code implementations • 30 Sep 2022 • Tianlin Liu, Joan Puigcerver, Mathieu Blondel

The smoothness of the objectives increases as $k$ increases, giving rise to a trade-off between convergence speed and sparsity of the optimal plan.

Paper
Add Code

Learning Energy Networks with Generalized Fenchel-Young Losses

no code implementations • 19 May 2022 • Mathieu Blondel, Felipe Llinares-López, Robert Dadashi, Léonard Hussenot, Matthieu Geist

To learn the parameters of the energy function, the solution to that optimization problem is typically fed into a loss function.

Imitation Learning

Paper
Add Code

Cutting Some Slack for SGD with Adaptive Polyak Stepsizes

no code implementations • 24 Feb 2022 • Robert M. Gower, Mathieu Blondel, Nidham Gazagnadou, Fabian Pedregosa

We use this insight to develop new variants of the SPS method that are better suited to nonlinear models.

Paper
Add Code

Sinkformers: Transformers with Doubly Stochastic Attention

1 code implementation • 22 Oct 2021 • Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

We show that the row-wise stochastic attention matrices in classical Transformers get close to doubly stochastic matrices as the number of epochs increases, justifying the use of Sinkhorn normalization as an informative prior.

Image Classification

Paper
Code

Sparse Continuous Distributions and Fenchel-Young Losses

1 code implementation • 4 Aug 2021 • André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae

In contrast, for finite domains, recent work on sparse alternatives to softmax (e. g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support.

Audio Classification Question Answering +1

Paper
Code

Efficient and Modular Implicit Differentiation

1 code implementation • NeurIPS 2021 • Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert

In this paper, we propose automatic implicit differentiation, an efficient and modular approach for implicit differentiation of optimization problems.

Meta-Learning

886

Paper
Code

Implicit differentiation for fast hyperparameter selection in non-smooth convex learning

1 code implementation • 4 May 2021 • Quentin Bertrand, Quentin Klopfenstein, Mathurin Massias, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

Finding the optimal hyperparameters of a model can be cast as a bilevel optimization problem, typically solved using zero-order techniques.

Bilevel Optimization Hyperparameter Optimization

Paper
Code

Self-Supervised Learning of Audio Representations from Permutations with Differentiable Ranking

no code implementations • 17 Mar 2021 • Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour

Second, we show that inverting permutations is a meaningful pretext task for learning audio representations in an unsupervised fashion.

Classification General Classification +1

Paper
Add Code

Momentum Residual Neural Networks

1 code implementation • 15 Feb 2021 • Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.

Ranked #127 on Image Classification on CIFAR-10

Image Classification

208

Paper
Code

Shuffle to Learn: Self-supervised learning from permutations via differentiable ranking

no code implementations • 1 Jan 2021 • Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour

In particular, we also improve music understanding by reordering spectrogram patches in the frequency space, as well as video classification by reordering frames along the time axis.

General Classification Self-Supervised Learning +1

Paper
Add Code

Learning with Differentiable Pertubed Optimizers

no code implementations • NeurIPS 2020 • Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

Machine learning pipelines often rely on optimizers procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).

Structured Prediction

Paper
Add Code

Differentiable Divergences Between Time Series

1 code implementation • 16 Oct 2020 • Mathieu Blondel, Arthur Mensch, Jean-Philippe Vert

Soft-DTW addresses these issues, but it is not a positive definite divergence: due to the bias introduced by entropic regularization, it can be negative and it is not minimized when the time series are equal.

Dynamic Time Warping Time Series +3

128

Paper
Code

Implicit differentiation of Lasso-type models for hyperparameter optimization

1 code implementation • ICML 2020 • Quentin Bertrand, Quentin Klopfenstein, Mathieu Blondel, Samuel Vaiter, Alexandre Gramfort, Joseph Salmon

Our approach scales to high-dimensional data by leveraging the sparsity of the solutions.

Hyperparameter Optimization Vocal Bursts Type Prediction

Paper
Code

Learning with Differentiable Perturbed Optimizers

2 code implementations • 20 Feb 2020 • Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

Machine learning pipelines often rely on optimization procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).

Structured Prediction

Paper
Code

Fast Differentiable Sorting and Ranking

2 code implementations • ICML 2020 • Mathieu Blondel, Olivier Teboul, Quentin Berthet, Josip Djolonga

While numerous works have proposed differentiable proxies to sorting and ranking, they do not achieve the $O(n \log n)$ time complexity one would expect from sorting and ranking operations.

734

Paper
Code

Structured Prediction with Projection Oracles

1 code implementation • NeurIPS 2019 • Mathieu Blondel

We identify the marginal polytope, the output space's convex hull, as the best convex set on which to project.

Structured Prediction

Paper
Code

Geometric Losses for Distributional Learning

no code implementations • 15 May 2019 • Arthur Mensch, Mathieu Blondel, Gabriel Peyré

Building upon recent advances in entropy-regularized optimal transport, and upon Fenchel duality between measures and continuous functions , we propose a generalization of the logistic loss that incorporates a metric or cost between classes.

regression

Paper
Add Code

Learning with Fenchel-Young Losses

3 code implementations • 8 Jan 2019 • Mathieu Blondel, André F. T. Martins, Vlad Niculae

Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction.

Structured Prediction

393

Paper
Code

Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms

2 code implementations • 24 May 2018 • Mathieu Blondel, André F. T. Martins, Vlad Niculae

This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function.

182

Paper
Code

Blind Source Separation with Optimal Transport Non-negative Matrix Factorization

no code implementations • 15 Feb 2018 • Antoine Rolet, Vivien Seguy, Mathieu Blondel, Hiroshi Sawada

Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention.

blind source separation

Paper
Add Code

SparseMAP: Differentiable Sparse Structured Inference

3 code implementations • ICML 2018 • Vlad Niculae, André F. T. Martins, Mathieu Blondel, Claire Cardie

Structured prediction requires searching over a combinatorial number of structures.

Dependency Parsing Natural Language Inference +1

182

Paper
Code

Differentiable Dynamic Programming for Structured Prediction and Attention

no code implementations • ICML 2018 • Arthur Mensch, Mathieu Blondel

We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.

Machine Translation Structured Prediction +3

Paper
Add Code

Large Scale Optimal Transport and Mapping Estimation

no code implementations • ICLR 2018 • Vivien Seguy, Bharath Bhushan Damodaran, Remi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel

First, we learn an optimal transport (OT) plan, which can be thought as a one-to-many map between the two distributions.

Domain Adaptation

Paper
Add Code

Large-Scale Optimal Transport and Mapping Estimation

2 code implementations • 7 Nov 2017 • Vivien Seguy, Bharath Bhushan Damodaran, Rémi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel

We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures.

Domain Adaptation

Paper
Code

Smooth and Sparse Optimal Transport

1 code implementation • 17 Oct 2017 • Mathieu Blondel, Vivien Seguy, Antoine Rolet

In this paper, we explore regularizing the primal and dual OT formulations with a strongly convex term, which corresponds to relaxing the dual and primal constraints with smooth approximations.

Paper
Code

A Regularized Framework for Sparse and Structured Neural Attention

3 code implementations • NeurIPS 2017 • Vlad Niculae, Mathieu Blondel

Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input.

Machine Translation Natural Language Inference +3

220

Paper
Code

Multi-output Polynomial Networks and Factorization Machines

no code implementations • NeurIPS 2017 • Mathieu Blondel, Vlad Niculae, Takuma Otsuka, Naonori Ueda

On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.

General Classification

Paper
Add Code

Soft-DTW: a Differentiable Loss Function for Time-Series

8 code implementations • ICML 2017 • Marco Cuturi, Mathieu Blondel

We propose in this paper a differentiable learning loss between time series, building upon the celebrated dynamic time warping (DTW) discrepancy.

Dynamic Time Warping Time Series +1

512

Paper
Code

Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms

no code implementations • 29 Jul 2016 • Mathieu Blondel, Masakazu Ishihata, Akinori Fujino, Naonori Ueda

Polynomial networks and factorization machines are two recently-proposed models that can efficiently use feature interactions in classification and regression tasks.

General Classification Recommendation Systems +1

Paper
Add Code

Higher-Order Factorization Machines

4 code implementations • NeurIPS 2016 • Mathieu Blondel, Akinori Fujino, Naonori Ueda, Masakazu Ishihata

Factorization machines (FMs) are a supervised learning approach that can use second-order feature combinations even when the data is very high-dimensional.

Link Prediction

783

Paper
Code

API design for machine learning software: experiences from the scikit-learn project

4 code implementations • 1 Sep 2013 • Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake Vanderplas, Arnaud Joly, Brian Holt, Gaël Varoquaux

Scikit-learn is an increasingly popular machine learning li- brary.

BIG-bench Machine Learning

Paper
Code

Scikit-learn: Machine Learning in Python

3 code implementations • 2 Jan 2012 • Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.

BIG-bench Machine Learning Clustering +3

58,036

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.