Search Results for author: Mathieu Blondel

Found 40 papers, 21 papers with code

The Elements of Differentiable Programming

1 code implementation21 Mar 2024 Mathieu Blondel, Vincent Roulet

Artificial intelligence has recently experienced remarkable advances, fueled by large models, vast datasets, accelerated hardware, and, last but not least, the transformative power of differentiable programming.

How do Transformers perform In-Context Autoregressive Learning?

no code implementations8 Feb 2024 Michael E. Sander, Raja Giryes, Taiji Suzuki, Mathieu Blondel, Gabriel Peyré

More precisely, focusing on commuting orthogonal matrices $W$, we first show that a trained one-layer linear Transformer implements one step of gradient descent for the minimization of an inner objective function, when considering augmented tokens.

Language Modelling

Direct Language Model Alignment from Online AI Feedback

no code implementations7 Feb 2024 Shangmin Guo, Biao Zhang, Tianlin Liu, Tianqi Liu, Misha Khalman, Felipe Llinares, Alexandre Rame, Thomas Mesnard, Yao Zhao, Bilal Piot, Johan Ferret, Mathieu Blondel

Moreover, responses in these datasets are often sampled from a language model distinct from the one being aligned, and since the model evolves over training, the alignment phase is inevitably off-policy.

Language Modelling

Routers in Vision Mixture of Experts: An Empirical Study

no code implementations29 Jan 2024 Tianlin Liu, Mathieu Blondel, Carlos Riquelme, Joan Puigcerver

Routers for sparse MoEs can be further grouped into two variants: Token Choice, which matches experts to each token, and Expert Choice, which matches tokens to each expert.

Language Modelling

Dual Gauss-Newton Directions for Deep Learning

no code implementations17 Aug 2023 Vincent Roulet, Mathieu Blondel

Inspired by Gauss-Newton-like methods, we study the benefit of leveraging the structure of deep learning objectives, namely, the composition of a convex loss function and of a nonlinear network, in order to derive better direction oracles than stochastic gradients, based on the idea of partial linearization.

Sparsity-Constrained Optimal Transport

no code implementations30 Sep 2022 Tianlin Liu, Joan Puigcerver, Mathieu Blondel

The smoothness of the objectives increases as $k$ increases, giving rise to a trade-off between convergence speed and sparsity of the optimal plan.

Learning Energy Networks with Generalized Fenchel-Young Losses

no code implementations19 May 2022 Mathieu Blondel, Felipe Llinares-López, Robert Dadashi, Léonard Hussenot, Matthieu Geist

To learn the parameters of the energy function, the solution to that optimization problem is typically fed into a loss function.

Imitation Learning

Cutting Some Slack for SGD with Adaptive Polyak Stepsizes

no code implementations24 Feb 2022 Robert M. Gower, Mathieu Blondel, Nidham Gazagnadou, Fabian Pedregosa

We use this insight to develop new variants of the SPS method that are better suited to nonlinear models.

Sinkformers: Transformers with Doubly Stochastic Attention

1 code implementation22 Oct 2021 Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

We show that the row-wise stochastic attention matrices in classical Transformers get close to doubly stochastic matrices as the number of epochs increases, justifying the use of Sinkhorn normalization as an informative prior.

Image Classification

Sparse Continuous Distributions and Fenchel-Young Losses

1 code implementation4 Aug 2021 André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae

In contrast, for finite domains, recent work on sparse alternatives to softmax (e. g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support.

Audio Classification Question Answering +1

Efficient and Modular Implicit Differentiation

1 code implementation NeurIPS 2021 Mathieu Blondel, Quentin Berthet, Marco Cuturi, Roy Frostig, Stephan Hoyer, Felipe Llinares-López, Fabian Pedregosa, Jean-Philippe Vert

In this paper, we propose automatic implicit differentiation, an efficient and modular approach for implicit differentiation of optimization problems.

Meta-Learning

Momentum Residual Neural Networks

1 code implementation15 Feb 2021 Michael E. Sander, Pierre Ablin, Mathieu Blondel, Gabriel Peyré

We show on CIFAR and ImageNet that Momentum ResNets have the same accuracy as ResNets, while having a much smaller memory footprint, and show that pre-trained Momentum ResNets are promising for fine-tuning models.

Image Classification

Shuffle to Learn: Self-supervised learning from permutations via differentiable ranking

no code implementations1 Jan 2021 Andrew N Carr, Quentin Berthet, Mathieu Blondel, Olivier Teboul, Neil Zeghidour

In particular, we also improve music understanding by reordering spectrogram patches in the frequency space, as well as video classification by reordering frames along the time axis.

General Classification Self-Supervised Learning +1

Learning with Differentiable Pertubed Optimizers

no code implementations NeurIPS 2020 Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

Machine learning pipelines often rely on optimizers procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).

Structured Prediction

Differentiable Divergences Between Time Series

1 code implementation16 Oct 2020 Mathieu Blondel, Arthur Mensch, Jean-Philippe Vert

Soft-DTW addresses these issues, but it is not a positive definite divergence: due to the bias introduced by entropic regularization, it can be negative and it is not minimized when the time series are equal.

Dynamic Time Warping Time Series +3

Learning with Differentiable Perturbed Optimizers

2 code implementations20 Feb 2020 Quentin Berthet, Mathieu Blondel, Olivier Teboul, Marco Cuturi, Jean-Philippe Vert, Francis Bach

Machine learning pipelines often rely on optimization procedures to make discrete decisions (e. g., sorting, picking closest neighbors, or shortest paths).

Structured Prediction

Fast Differentiable Sorting and Ranking

2 code implementations ICML 2020 Mathieu Blondel, Olivier Teboul, Quentin Berthet, Josip Djolonga

While numerous works have proposed differentiable proxies to sorting and ranking, they do not achieve the $O(n \log n)$ time complexity one would expect from sorting and ranking operations.

Structured Prediction with Projection Oracles

1 code implementation NeurIPS 2019 Mathieu Blondel

We identify the marginal polytope, the output space's convex hull, as the best convex set on which to project.

Structured Prediction

Geometric Losses for Distributional Learning

no code implementations15 May 2019 Arthur Mensch, Mathieu Blondel, Gabriel Peyré

Building upon recent advances in entropy-regularized optimal transport, and upon Fenchel duality between measures and continuous functions , we propose a generalization of the logistic loss that incorporates a metric or cost between classes.

regression

Learning with Fenchel-Young Losses

3 code implementations8 Jan 2019 Mathieu Blondel, André F. T. Martins, Vlad Niculae

Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction.

Structured Prediction

Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms

2 code implementations24 May 2018 Mathieu Blondel, André F. T. Martins, Vlad Niculae

This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function.

Blind Source Separation with Optimal Transport Non-negative Matrix Factorization

no code implementations15 Feb 2018 Antoine Rolet, Vivien Seguy, Mathieu Blondel, Hiroshi Sawada

Optimal transport as a loss for machine learning optimization problems has recently gained a lot of attention.

blind source separation

Differentiable Dynamic Programming for Structured Prediction and Attention

no code implementations ICML 2018 Arthur Mensch, Mathieu Blondel

We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.

Machine Translation Structured Prediction +3

Large-Scale Optimal Transport and Mapping Estimation

2 code implementations7 Nov 2017 Vivien Seguy, Bharath Bhushan Damodaran, Rémi Flamary, Nicolas Courty, Antoine Rolet, Mathieu Blondel

We prove two theoretical stability results of regularized OT which show that our estimations converge to the OT plan and Monge map between the underlying continuous measures.

Domain Adaptation

Smooth and Sparse Optimal Transport

1 code implementation17 Oct 2017 Mathieu Blondel, Vivien Seguy, Antoine Rolet

In this paper, we explore regularizing the primal and dual OT formulations with a strongly convex term, which corresponds to relaxing the dual and primal constraints with smooth approximations.

A Regularized Framework for Sparse and Structured Neural Attention

3 code implementations NeurIPS 2017 Vlad Niculae, Mathieu Blondel

Modern neural networks are often augmented with an attention mechanism, which tells the network where to focus within the input.

Machine Translation Natural Language Inference +3

Multi-output Polynomial Networks and Factorization Machines

no code implementations NeurIPS 2017 Mathieu Blondel, Vlad Niculae, Takuma Otsuka, Naonori Ueda

On recommendation system tasks, we show how to combine our algorithm with a reduction from ordinal regression to multi-output classification and show that the resulting algorithm outperforms simple baselines in terms of ranking accuracy.

General Classification

Soft-DTW: a Differentiable Loss Function for Time-Series

8 code implementations ICML 2017 Marco Cuturi, Mathieu Blondel

We propose in this paper a differentiable learning loss between time series, building upon the celebrated dynamic time warping (DTW) discrepancy.

Dynamic Time Warping Time Series +1

Polynomial Networks and Factorization Machines: New Insights and Efficient Training Algorithms

no code implementations29 Jul 2016 Mathieu Blondel, Masakazu Ishihata, Akinori Fujino, Naonori Ueda

Polynomial networks and factorization machines are two recently-proposed models that can efficiently use feature interactions in classification and regression tasks.

General Classification Recommendation Systems +1

Higher-Order Factorization Machines

4 code implementations NeurIPS 2016 Mathieu Blondel, Akinori Fujino, Naonori Ueda, Masakazu Ishihata

Factorization machines (FMs) are a supervised learning approach that can use second-order feature combinations even when the data is very high-dimensional.

Link Prediction

Cannot find the paper you are looking for? You can Submit a new open access paper.