Search Results for author: Arthur Mensch

Found 22 papers, 9 papers with code

Extra-gradient with player sampling for faster convergence in n-player games

no code implementations ICML 2020 Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e. g. when training GANs.

Scaling Language Models: Methods, Analysis & Insights from Training Gopher

no code implementations NA 2021 Jack W. Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, Eliza Rutherford, Tom Hennigan, Jacob Menick, Albin Cassirer, Richard Powell, George van den Driessche, Lisa Anne Hendricks, Maribeth Rauh, Po-Sen Huang, Amelia Glaese, Johannes Welbl, Sumanth Dathathri, Saffron Huang, Jonathan Uesato, John Mellor, Irina Higgins, Antonia Creswell, Nat McAleese, Amy Wu, Erich Elsen, Siddhant Jayakumar, Elena Buchatskaya, David Budden, Esme Sutherland, Karen Simonyan, Michela Paganini, Laurent SIfre, Lena Martens, Xiang Lorraine Li, Adhiguna Kuncoro, Aida Nematzadeh, Elena Gribovskaya, Domenic Donato, Angeliki Lazaridou, Arthur Mensch, Jean-Baptiste Lespiau, Maria Tsimpoukelli, Nikolai Grigorev, Doug Fritz, Thibault Sottiaux, Mantas Pajarskas, Toby Pohlen, Zhitao Gong, Daniel Toyama, Cyprien de Masson d'Autume, Yujia Li, Tayfun Terzi, Vladimir Mikulik, Igor Babuschkin, Aidan Clark, Diego de Las Casas, Aurelia Guy, Chris Jones, James Bradbury, Matthew Johnson, Blake Hechtman, Laura Weidinger, Iason Gabriel, William Isaac, Ed Lockhart, Simon Osindero, Laura Rimell, Chris Dyer, Oriol Vinyals, Kareem Ayoub, Jeff Stanway, Lorrayne Bennett, Demis Hassabis, Koray Kavukcuoglu, Geoffrey Irving

Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world.

Abstract Algebra Anachronisms +133

Adam is no better than normalized SGD: Dissecting how adaptivity improves GAN performance

no code implementations29 Sep 2021 Samy Jelassi, Arthur Mensch, Gauthier Gidel, Yuanzhi Li

We empirically show that SGDA with the same vector norm as Adam reaches similar or even better performance than the latter.

Differentiable Divergences Between Time Series

1 code implementation16 Oct 2020 Mathieu Blondel, Arthur Mensch, Jean-Philippe Vert

Soft-DTW addresses these issues, but it is not a positive definite divergence: due to the bias introduced by entropic regularization, it can be negative and it is not minimized when the time series are equal.

Dynamic Time Warping Time Series Averaging +1

Fine-grain atlases of functional modes for fMRI analysis

no code implementations5 Mar 2020 Kamalaker Dadi, Gaël Varoquaux, Antonia Machlouzarides-Shalit, Krzysztof J. Gorgolewski, Demian Wassermann, Bertrand Thirion, Arthur Mensch

We demonstrate the benefits of extracting reduced signals on our fine-grain atlases for many classic functional data analysis pipelines: stimuli decoding from 12, 334 brain responses, standard GLM analysis of fMRI across sessions and individuals, extraction of resting-state functional-connectomes biomarkers for 2, 500 individuals, data compression and meta-analysis over more than 15, 000 statistical maps.

Data Compression

Online Sinkhorn: Optimal Transport distances from sample streams

no code implementations NeurIPS 2020 Arthur Mensch, Gabriel Peyré

Optimal Transport (OT) distances are now routinely used as loss functions in ML tasks.

A mean-field analysis of two-player zero-sum games

no code implementations NeurIPS 2020 Carles Domingo-Enrich, Samy Jelassi, Arthur Mensch, Grant Rotskoff, Joan Bruna

Our method identifies mixed equilibria in high dimensions and is demonstrably effective for training mixtures of GANs.

Extragradient with player sampling for faster Nash equilibrium finding

1 code implementation29 May 2019 Carles Domingo Enrich, Samy Jelassi, Carles Domingo-Enrich, Damien Scieur, Arthur Mensch, Joan Bruna

Data-driven modeling increasingly requires to find a Nash equilibrium in multi-player games, e. g. when training GANs.

Geometric Losses for Distributional Learning

no code implementations15 May 2019 Arthur Mensch, Mathieu Blondel, Gabriel Peyré

Building upon recent advances in entropy-regularized optimal transport, and upon Fenchel duality between measures and continuous functions , we propose a generalization of the logistic loss that incorporates a metric or cost between classes.

Extracting representations of cognition across neuroimaging studies improves brain decoding

1 code implementation17 Sep 2018 Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux

Analyzing data across studies could bring more statistical power; yet the current brain-imaging analytic framework cannot be used at scale as it requires casting all cognitive tasks in a unified theoretical framework.

Brain Decoding

Differentiable Dynamic Programming for Structured Prediction and Attention

no code implementations ICML 2018 Arthur Mensch, Mathieu Blondel

We showcase these instantiations on two structured prediction tasks and on structured and sparse attention for neural machine translation.

Machine Translation Structured Prediction +2

Stochastic Subsampling for Factorizing Huge Matrices

1 code implementation19 Jan 2017 Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux

We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns.

Dictionary Learning

Learning brain regions via large-scale online structured sparse dictionary learning

no code implementations NeurIPS 2016 Elvis Dohmatob, Arthur Mensch, Gael Varoquaux, Bertrand Thirion

We propose a multivariate online dictionary-learning method for obtaining decompositions of brain images with structured and sparse components (aka atoms).

Dictionary Learning

Subsampled online matrix factorization with convergence guarantees

1 code implementation30 Nov 2016 Arthur Mensch, Julien Mairal, Gaël Varoquaux, Bertrand Thirion

We present a matrix factorization algorithm that scales to input matrices that are large in both dimensions (i. e., that contains morethan 1TB of data).

Dictionary Learning for Massive Matrix Factorization

1 code implementation3 May 2016 Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux

Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising.

Collaborative Filtering Dictionary Learning +2

Compressed Online Dictionary Learning for Fast fMRI Decomposition

no code implementations8 Feb 2016 Arthur Mensch, Gaël Varoquaux, Bertrand Thirion

We present a method for fast resting-state fMRI spatial decomposi-tions of very large datasets, based on the reduction of the temporal dimension before applying dictionary learning on concatenated individual records from groups of subjects.

Dictionary Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.