Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

1 code implementation15 Mar 2022 Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words.

Benchmarking missing-values approaches for predictive models on health databases

1 code implementation17 Feb 2022 Alexandre Perez-Lebel, Gaël Varoquaux, Marine Le Morvan, Julie Josse, Jean-Baptiste Poline

Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning.


Preventing dataset shift from breaking machine-learning biomarkers

1 code implementation21 Jul 2021 Jéroôme Dockès, Gaël Varoquaux, Jean-Baptiste Poline

When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers.

What's a good imputation to predict with missing values?

1 code implementation1 Jun 2021 Marine Le Morvan, Julie Josse, Erwan Scornet, Gaël Varoquaux

In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn.


Common Limitations of Image Processing Metrics: A Picture Story

1 code implementation12 Apr 2021 Annika Reinke, Minu D. Tizabi, Carole H. Sudre, Matthias Eisenmann, Tim Rädsch, Michael Baumgartner, Laura Acion, Michela Antonelli, Tal Arbel, Spyridon Bakas, Peter Bankhead, Arriel Benis, M. Jorge Cardoso, Veronika Cheplygina, Evangelia Christodoulou, Beth Cimini, Gary S. Collins, Keyvan Farahani, Bram van Ginneken, Ben Glocker, Patrick Godau, Fred Hamprecht, Daniel A. Hashimoto, Doreen Heckmann-Nötzel, Michael M. Hoffmann, Merel Huisman, Fabian Isensee, Pierre Jannin, Charles E. Kahn, Alexandros Karargyris, Alan Karthikesalingam, Bernhard Kainz, Emre Kavur, Hannes Kenngott, Jens Kleesiek, Thijs Kooi, Michal Kozubek, Anna Kreshuk, Tahsin Kurc, Bennett A. Landman, Geert Litjens, Amin Madani, Klaus Maier-Hein, Anne L. Martel, Peter Mattson, Erik Meijering, Bjoern Menze, David Moher, Karel G. M. Moons, Henning Müller, Brennan Nichyporuk, Felix Nickel, Jens Petersen, Gorkem Polat, Nasir Rajpoot, Mauricio Reyes, Nicola Rieke, Michael Riegler, Hassan Rivaz, Julio Saez-Rodriguez, Clarisa Sanchez Gutierrez, Julien Schroeter, Anindo Saha, Shravya Shetty, Maarten van Smeden, Bram Stieltjes, Ronald M. Summers, Abdel A. Taha, Sotirios A. Tsaftaris, Ben van Calster, Gaël Varoquaux, Manuel Wiesenfarth, Ziv R. Yaniv, Annette Kopp-Schneider, Paul Jäger, Lena Maier-Hein

While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation.

How I failed machine learning in medical imaging -- shortcomings and recommendations

1 code implementation18 Mar 2021 Gaël Varoquaux, Veronika Cheplygina

Finally we provide a broad range of recommendations on how to further these address problems in the future.

Accounting for Variance in Machine Learning Benchmarks

no code implementations1 Mar 2021 Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices.

A Lightweight Neural Model for Biomedical Entity Linking

1 code implementation16 Dec 2020 Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

Biomedical entity linking aims to map biomedical mentions, such as diseases and drugs, to standard entities in a given knowledge base.

NeuMiss networks: differentiable programming for supervised learning with missing values

no code implementations3 Jul 2020 Marine Le Morvan, Julie Josse, Thomas Moreau, Erwan Scornet, Gaël Varoquaux

We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.


Fine-grain atlases of functional modes for fMRI analysis

no code implementations5 Mar 2020 Kamalaker Dadi, Gaël Varoquaux, Antonia Machlouzarides-Shalit, Krzysztof J. Gorgolewski, Demian Wassermann, Bertrand Thirion, Arthur Mensch

We demonstrate the benefits of extracting reduced signals on our fine-grain atlases for many classic functional data analysis pipelines: stimuli decoding from 12, 334 brain responses, standard GLM analysis of fMRI across sessions and individuals, extraction of resting-state functional-connectomes biomarkers for 2, 500 individuals, data compression and meta-analysis over more than 15, 000 statistical maps.

NeuroQuery: comprehensive meta-analysis of human brain mapping

no code implementations21 Feb 2020 Jérôme Dockès, Russell Poldrack, Romain Primet, Hande Gözükan, Tal Yarkoni, Fabian Suchanek, Bertrand Thirion, Gaël Varoquaux

Reaching a global view of brain organization requires assembling evidence on widely different mental processes and mechanisms.

Linear predictor on linearly-generated data with missing values: non consistency and solutions

1 code implementation3 Feb 2020 Marine Le Morvan, Nicolas Prost, Julie Josse, Erwan Scornet, Gaël Varoquaux

In the particular Gaussian case, it can be written as a linear function of multiway interactions between the observed data and the various missing-value indicators.

Encoding high-cardinality string categorical variables

1 code implementation3 Jul 2019 Patricio Cerda, Gaël Varoquaux

We introduce two encoding approaches for string categories: a Gamma-Poisson matrix factorization on substring counts, and the min-hash encoder, for fast approximation of string similarities.

On the consistency of supervised learning with missing values

2 code implementations19 Feb 2019 Julie Josse, Nicolas Prost, Erwan Scornet, Gaël Varoquaux

A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative.


Approximate message-passing for convex optimization with non-separable penalties

no code implementations17 Sep 2018 Andre Manoel, Florent Krzakala, Gaël Varoquaux, Bertrand Thirion, Lenka Zdeborová

We introduce an iterative optimization scheme for convex objectives consisting of a linear loss and a non-separable penalty, based on the expectation-consistent approximation and the vector approximate message-passing (VAMP) algorithm.

Extracting representations of cognition across neuroimaging studies improves brain decoding

1 code implementation17 Sep 2018 Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux

Analyzing data across studies could bring more statistical power; yet the current brain-imaging analytic framework cannot be used at scale as it requires casting all cognitive tasks in a unified theoretical framework.

Similarity encoding for learning with dirty categorical variables

2 code implementations4 Jun 2018 Patricio Cerda, Gaël Varoquaux, Balázs Kégl

We show that a simple approach that exposes the redundancy to the learning algorithm brings significant gains.

Text to brain: predicting the spatial distribution of neuroimaging observations from text reports

no code implementations4 Jun 2018 Jérôme Dockès, Demian Wassermann, Russell Poldrack, Fabian Suchanek, Bertrand Thirion, Gaël Varoquaux

In this paper, we propose to mine brain medical publications to learn the spatial distribution associated with anatomical terms.

Cross-validation failure: small sample sizes lead to large error bars

1 code implementation23 Jun 2017 Gaël Varoquaux

Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers.

Subsampled online matrix factorization with convergence guarantees

1 code implementation30 Nov 2016 Arthur Mensch, Julien Mairal, Gaël Varoquaux, Bertrand Thirion

We present a matrix factorization algorithm that scales to input matrices that are large in both dimensions (i. e., that contains morethan 1TB of data).

Social-sparsity brain decoders: faster spatial sparsity

no code implementations21 Jun 2016 Gaël Varoquaux, Matthieu Kowalski, Bertrand Thirion

Spatially-sparse predictors are good models for brain decoding: they give accurate predictions and their weight maps are interpretable as they focus on a small number of regions.

Learning to Discover Sparse Graphical Models

1 code implementation ICML 2017 Eugene Belilovsky, Kyle Kastner, Gaël Varoquaux, Matthew Blaschko

Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood.

Dictionary Learning for Massive Matrix Factorization

1 code implementation3 May 2016 Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux

Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising.

Compressed Online Dictionary Learning for Fast fMRI Decomposition

no code implementations8 Feb 2016 Arthur Mensch, Gaël Varoquaux, Bertrand Thirion

We present a method for fast resting-state fMRI spatial decomposi-tions of very large datasets, based on the reduction of the temporal dimension before applying dictionary learning on concatenated individual records from groups of subjects.

Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity

no code implementations NeurIPS 2016 Eugene Belilovsky, Gaël Varoquaux, Matthew B. Blaschko

We characterize the uncertainty of differences with confidence intervals obtained using a parametric distribution on parameters of a sparse estimator.

FAASTA: A fast solver for total-variation regularization of ill-conditioned problems with application to brain imaging

no code implementations22 Dec 2015 Gaël Varoquaux, Michael Eickenberg, Elvis Dohmatob, Bertand Thirion

The total variation (TV) penalty, as many other analysis-sparsity problems, does not lead to separable factors or a proximal operatorwith a closed-form expression, such as soft thresholding for the $\ell\_1$ penalty.

Mapping cognitive ontologies to and from the brain

no code implementations15 Nov 2013 Yannick Schwartz, Bertrand Thirion, Gaël Varoquaux

Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies.

