Search Results for author: Gaël Varoquaux

Found 47 papers, 29 papers with code

CARTE: pretraining and transfer for tabular learning

no code implementations • 26 Feb 2024 • Myung Jun Kim, Léo Grinsztajn, Gaël Varoquaux

The architecture - CARTE for Context Aware Representation of Table Entries - uses a graph representation of tabular (or relational) data to process tables with different columns, string embeddings of entries and columns names to model an open vocabulary, and a graph-attentional network to contextualize entries with column names and neighboring entries.

Data Integration

Paper
Add Code

Reconfidencing LLMs from the Grouping Loss Perspective

no code implementations • 7 Feb 2024 • Lihu Chen, Alexandre Perez-Lebel, Fabian M. Suchanek, Gaël Varoquaux

In this work, we construct a new evaluation dataset derived from a knowledge base to assess confidence scores given to answers of Mistral and LLaMA.

Uncertainty Quantification

Paper
Add Code

Learning High-Quality and General-Purpose Phrase Representations

1 code implementation • 18 Jan 2024 • Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

The framework employs phrase type classification as an auxiliary task and incorporates character-level information more effectively into the phrase representation.

Contrastive Learning Data Augmentation +1

Paper
Code

Vectorizing string entries for data processing on tables: when are larger language models better?

no code implementations • 15 Dec 2023 • Léo Grinsztajn, Edouard Oyallon, Myung Jun Kim, Gaël Varoquaux

We study the benefits of language models in 14 analytical tasks on tables while varying the training size, as well as for a fuzzy join benchmark.

Paper
Add Code

The Locality and Symmetry of Positional Encodings

1 code implementation • 19 Oct 2023 • Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

Positional Encodings (PEs) are used to inject word-order information into transformer-based language models.

Sentence

Paper
Code

Causal thinking for decision making on Electronic Health Records: why and how

1 code implementation • 3 Aug 2023 • Matthieu Doutreligne, Tristan Struja, Judith Abecassis, Claire Morgand, Leo Anthony Celi, Gaël Varoquaux

We illustrate the various choices in studying the effect of albumin on sepsis mortality in the Medical Information Mart for Intensive Care database (MIMIC-IV).

Decision Making valid

Paper
Code

GLADIS: A General and Large Acronym Disambiguation Benchmark

1 code implementation • 3 Feb 2023 • Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

Acronym Disambiguation (AD) is crucial for natural language understanding on various sources, including biomedical reports, scientific papers, and search engine queries.

Language Modelling Natural Language Understanding

Paper
Code

Understanding metric-related pitfalls in image analysis validation

no code implementations • 3 Feb 2023 • Annika Reinke, Minu D. Tizabi, Michael Baumgartner, Matthias Eisenmann, Doreen Heckmann-Nötzel, A. Emre Kavur, Tim Rädsch, Carole H. Sudre, Laura Acion, Michela Antonelli, Tal Arbel, Spyridon Bakas, Arriel Benis, Matthew Blaschko, Florian Buettner, M. Jorge Cardoso, Veronika Cheplygina, Jianxu Chen, Evangelia Christodoulou, Beth A. Cimini, Gary S. Collins, Keyvan Farahani, Luciana Ferrer, Adrian Galdran, Bram van Ginneken, Ben Glocker, Patrick Godau, Robert Haase, Daniel A. Hashimoto, Michael M. Hoffman, Merel Huisman, Fabian Isensee, Pierre Jannin, Charles E. Kahn, Dagmar Kainmueller, Bernhard Kainz, Alexandros Karargyris, Alan Karthikesalingam, Hannes Kenngott, Jens Kleesiek, Florian Kofler, Thijs Kooi, Annette Kopp-Schneider, Michal Kozubek, Anna Kreshuk, Tahsin Kurc, Bennett A. Landman, Geert Litjens, Amin Madani, Klaus Maier-Hein, Anne L. Martel, Peter Mattson, Erik Meijering, Bjoern Menze, Karel G. M. Moons, Henning Müller, Brennan Nichyporuk, Felix Nickel, Jens Petersen, Susanne M. Rafelski, Nasir Rajpoot, Mauricio Reyes, Michael A. Riegler, Nicola Rieke, Julio Saez-Rodriguez, Clara I. Sánchez, Shravya Shetty, Maarten van Smeden, Ronald M. Summers, Abdel A. Taha, Aleksei Tiulpin, Sotirios A. Tsaftaris, Ben van Calster, Gaël Varoquaux, Manuel Wiesenfarth, Ziv R. Yaniv, Paul F. Jäger, Lena Maier-Hein

Validation metrics are key for the reliable tracking of scientific progress and for bridging the current chasm between artificial intelligence (AI) research and its translation into practice.

Paper
Add Code

How to select predictive models for causal inference?

no code implementations • 1 Feb 2023 • Matthieu Doutreligne, Gaël Varoquaux

But does computing these nuisance adds noise to model selection?

Causal Inference Model Selection +1

Paper
Add Code

Beyond calibration: estimating the grouping loss of modern neural networks

2 code implementations • 28 Oct 2022 • Alexandre Perez-Lebel, Marine Le Morvan, Gaël Varoquaux

Yet calibration is not enough: even a perfectly calibrated classifier with the best possible accuracy can have confidence scores that are far from the true posterior probabilities.

Decision Making

182

Paper
Code

Why do tree-based models still outperform deep learning on tabular data?

1 code implementation • 18 Jul 2022 • Léo Grinsztajn, Edouard Oyallon, Gaël Varoquaux

While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear.

Benchmarking

416

Paper
Code

Metrics reloaded: Recommendations for image analysis validation

1 code implementation • 3 Jun 2022 • Lena Maier-Hein, Annika Reinke, Patrick Godau, Minu D. Tizabi, Florian Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens Kleesiek, Michal Kozubek, Mauricio Reyes, Michael A. Riegler, Manuel Wiesenfarth, A. Emre Kavur, Carole H. Sudre, Michael Baumgartner, Matthias Eisenmann, Doreen Heckmann-Nötzel, Tim Rädsch, Laura Acion, Michela Antonelli, Tal Arbel, Spyridon Bakas, Arriel Benis, Matthew Blaschko, M. Jorge Cardoso, Veronika Cheplygina, Beth A. Cimini, Gary S. Collins, Keyvan Farahani, Luciana Ferrer, Adrian Galdran, Bram van Ginneken, Robert Haase, Daniel A. Hashimoto, Michael M. Hoffman, Merel Huisman, Pierre Jannin, Charles E. Kahn, Dagmar Kainmueller, Bernhard Kainz, Alexandros Karargyris, Alan Karthikesalingam, Hannes Kenngott, Florian Kofler, Annette Kopp-Schneider, Anna Kreshuk, Tahsin Kurc, Bennett A. Landman, Geert Litjens, Amin Madani, Klaus Maier-Hein, Anne L. Martel, Peter Mattson, Erik Meijering, Bjoern Menze, Karel G. M. Moons, Henning Müller, Brennan Nichyporuk, Felix Nickel, Jens Petersen, Nasir Rajpoot, Nicola Rieke, Julio Saez-Rodriguez, Clara I. Sánchez, Shravya Shetty, Maarten van Smeden, Ronald M. Summers, Abdel A. Taha, Aleksei Tiulpin, Sotirios A. Tsaftaris, Ben van Calster, Gaël Varoquaux, Paul F. Jäger

The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output.

Instance Segmentation object-detection +2

Paper
Code

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

1 code implementation • 15 Mar 2022 • Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words.

Contrastive Learning Language Modelling +1

Paper
Code

Benchmarking missing-values approaches for predictive models on health databases

1 code implementation • 17 Feb 2022 • Alexandre Perez-Lebel, Gaël Varoquaux, Marine Le Morvan, Julie Josse, Jean-Baptiste Poline

Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning.

Attribute Benchmarking +1

Paper
Code

Label scarcity in biomedicine: Data-rich latent factor discovery enhances phenotype prediction

no code implementations • 12 Oct 2021 • Marc-Andre Schulz, Bertrand Thirion, Alexandre Gramfort, Gaël Varoquaux, Danilo Bzdok

High-quality data accumulation is now becoming ubiquitous in the health domain.

Dimensionality Reduction

Paper
Add Code

Preventing dataset shift from breaking machine-learning biomarkers

1 code implementation • 21 Jul 2021 • Jéroôme Dockès, Gaël Varoquaux, Jean-Baptiste Poline

When a dataset shift occurs, standard machine-learning techniques do not suffice to extract and validate biomarkers.

BIG-bench Machine Learning

Paper
Code

What's a good imputation to predict with missing values?

1 code implementation • 1 Jun 2021 • Marine Le Morvan, Julie Josse, Erwan Scornet, Gaël Varoquaux

In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn.

Imputation regression

Paper
Code

Common Limitations of Image Processing Metrics: A Picture Story

1 code implementation • 12 Apr 2021 • Annika Reinke, Minu D. Tizabi, Carole H. Sudre, Matthias Eisenmann, Tim Rädsch, Michael Baumgartner, Laura Acion, Michela Antonelli, Tal Arbel, Spyridon Bakas, Peter Bankhead, Arriel Benis, Matthew Blaschko, Florian Buettner, M. Jorge Cardoso, Jianxu Chen, Veronika Cheplygina, Evangelia Christodoulou, Beth Cimini, Gary S. Collins, Sandy Engelhardt, Keyvan Farahani, Luciana Ferrer, Adrian Galdran, Bram van Ginneken, Ben Glocker, Patrick Godau, Robert Haase, Fred Hamprecht, Daniel A. Hashimoto, Doreen Heckmann-Nötzel, Peter Hirsch, Michael M. Hoffman, Merel Huisman, Fabian Isensee, Pierre Jannin, Charles E. Kahn, Dagmar Kainmueller, Bernhard Kainz, Alexandros Karargyris, Alan Karthikesalingam, A. Emre Kavur, Hannes Kenngott, Jens Kleesiek, Andreas Kleppe, Sven Kohler, Florian Kofler, Annette Kopp-Schneider, Thijs Kooi, Michal Kozubek, Anna Kreshuk, Tahsin Kurc, Bennett A. Landman, Geert Litjens, Amin Madani, Klaus Maier-Hein, Anne L. Martel, Peter Mattson, Erik Meijering, Bjoern Menze, David Moher, Karel G. M. Moons, Henning Müller, Brennan Nichyporuk, Felix Nickel, M. Alican Noyan, Jens Petersen, Gorkem Polat, Susanne M. Rafelski, Nasir Rajpoot, Mauricio Reyes, Nicola Rieke, Michael Riegler, Hassan Rivaz, Julio Saez-Rodriguez, Clara I. Sánchez, Julien Schroeter, Anindo Saha, M. Alper Selver, Lalith Sharan, Shravya Shetty, Maarten van Smeden, Bram Stieltjes, Ronald M. Summers, Abdel A. Taha, Aleksei Tiulpin, Sotirios A. Tsaftaris, Ben van Calster, Gaël Varoquaux, Manuel Wiesenfarth, Ziv R. Yaniv, Paul Jäger, Lena Maier-Hein

While the importance of automatic image analysis is continuously increasing, recent meta-research revealed major flaws with respect to algorithm validation.

Instance Segmentation object-detection +2

Paper
Code

How I failed machine learning in medical imaging -- shortcomings and recommendations

1 code implementation • 18 Mar 2021 • Gaël Varoquaux, Veronika Cheplygina

Finally we provide a broad range of recommendations on how to further these address problems in the future.

BIG-bench Machine Learning

Paper
Code

Accounting for Variance in Machine Learning Benchmarks

no code implementations • 1 Mar 2021 • Xavier Bouthillier, Pierre Delaunay, Mirko Bronzi, Assya Trofimov, Brennan Nichyporuk, Justin Szeto, Naz Sepah, Edward Raff, Kanika Madan, Vikram Voleti, Samira Ebrahimi Kahou, Vincent Michalski, Dmitriy Serdyuk, Tal Arbel, Chris Pal, Gaël Varoquaux, Pascal Vincent

Strong empirical evidence that one machine-learning algorithm A outperforms another one B ideally calls for multiple trials optimizing the learning pipeline over sources of variation such as data sampling, data augmentation, parameter initialization, and hyperparameters choices.

Benchmarking BIG-bench Machine Learning +1

Paper
Add Code

A Lightweight Neural Model for Biomedical Entity Linking

1 code implementation • 16 Dec 2020 • Lihu Chen, Gaël Varoquaux, Fabian M. Suchanek

Biomedical entity linking aims to map biomedical mentions, such as diseases and drugs, to standard entities in a given knowledge base.

Entity Linking

Paper
Code

NeuMiss networks: differentiable programming for supervised learning with missing values

no code implementations • 3 Jul 2020 • Marine Le Morvan, Julie Josse, Thomas Moreau, Erwan Scornet, Gaël Varoquaux

We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.

Imputation

Paper
Add Code

Fine-grain atlases of functional modes for fMRI analysis

no code implementations • 5 Mar 2020 • Kamalaker Dadi, Gaël Varoquaux, Antonia Machlouzarides-Shalit, Krzysztof J. Gorgolewski, Demian Wassermann, Bertrand Thirion, Arthur Mensch

We demonstrate the benefits of extracting reduced signals on our fine-grain atlases for many classic functional data analysis pipelines: stimuli decoding from 12, 334 brain responses, standard GLM analysis of fMRI across sessions and individuals, extraction of resting-state functional-connectomes biomarkers for 2, 500 individuals, data compression and meta-analysis over more than 15, 000 statistical maps.

Data Compression

Paper
Add Code

NeuroQuery: comprehensive meta-analysis of human brain mapping

no code implementations • 21 Feb 2020 • Jérôme Dockès, Russell Poldrack, Romain Primet, Hande Gözükan, Tal Yarkoni, Fabian Suchanek, Bertrand Thirion, Gaël Varoquaux

Reaching a global view of brain organization requires assembling evidence on widely different mental processes and mechanisms.

Paper
Add Code

Linear predictor on linearly-generated data with missing values: non consistency and solutions

1 code implementation • 3 Feb 2020 • Marine Le Morvan, Nicolas Prost, Julie Josse, Erwan Scornet, Gaël Varoquaux

In the particular Gaussian case, it can be written as a linear function of multiway interactions between the observed data and the various missing-value indicators.

Generalization Bounds

Paper
Code

What’s in a functional brain parcellation?

no code implementations • NeurIPS Workshop Neuro_AI 2019 • Gaël Varoquaux, Kamalakar Dadi, Arthur Mensch

Here we consider atlases used to parcellate the brain when studying brain function.

Paper
Add Code

Encoding high-cardinality string categorical variables

1 code implementation • 3 Jul 2019 • Patricio Cerda, Gaël Varoquaux

We introduce two encoding approaches for string categories: a Gamma-Poisson matrix factorization on substring counts, and the min-hash encoder, for fast approximation of string similarities.

AutoML Feature Engineering +1

Paper
Code

On the consistency of supervised learning with missing values

3 code implementations • 19 Feb 2019 • Julie Josse, Jacob M. Chen, Nicolas Prost, Erwan Scornet, Gaël Varoquaux

A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative.

Attribute Imputation

Paper
Code

Extracting representations of cognition across neuroimaging studies improves brain decoding

1 code implementation • 17 Sep 2018 • Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux

Analyzing data across studies could bring more statistical power; yet the current brain-imaging analytic framework cannot be used at scale as it requires casting all cognitive tasks in a unified theoretical framework.

Brain Decoding

Paper
Code

Approximate message-passing for convex optimization with non-separable penalties

2 code implementations • 17 Sep 2018 • Andre Manoel, Florent Krzakala, Gaël Varoquaux, Bertrand Thirion, Lenka Zdeborová

We introduce an iterative optimization scheme for convex objectives consisting of a linear loss and a non-separable penalty, based on the expectation-consistent approximation and the vector approximate message-passing (VAMP) algorithm.

Paper
Code

Text to brain: predicting the spatial distribution of neuroimaging observations from text reports

no code implementations • 4 Jun 2018 • Jérôme Dockès, Demian Wassermann, Russell Poldrack, Fabian Suchanek, Bertrand Thirion, Gaël Varoquaux

In this paper, we propose to mine brain medical publications to learn the spatial distribution associated with anatomical terms.

Paper
Add Code

Similarity encoding for learning with dirty categorical variables

2 code implementations • 4 Jun 2018 • Patricio Cerda, Gaël Varoquaux, Balázs Kégl

We show that a simple approach that exposes the redundancy to the learning algorithm brings significant gains.

Dimensionality Reduction

Paper
Code

Learning Neural Representations of Human Cognition across Many fMRI Studies

1 code implementation • NeurIPS 2017 • Arthur Mensch, Julien Mairal, Danilo Bzdok, Bertrand Thirion, Gaël Varoquaux

Cognitive neuroscience is enjoying rapid increase in extensive public brain-imaging datasets.

Dimensionality Reduction Multi-Task Learning

Paper
Code

Cross-validation failure: small sample sizes lead to large error bars

1 code implementation • 23 Jun 2017 • Gaël Varoquaux

Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers.

Paper
Code

Subsampled online matrix factorization with convergence guarantees

1 code implementation • 30 Nov 2016 • Arthur Mensch, Julien Mairal, Gaël Varoquaux, Bertrand Thirion

We present a matrix factorization algorithm that scales to input matrices that are large in both dimensions (i. e., that contains morethan 1TB of data).

133

Paper
Code

Deriving reproducible biomarkers from multi-site resting-state data: An Autism-based example

no code implementations • 18 Nov 2016 • Alexandre Abraham, Michael Milham, Adriana Di Martino, R. Cameron Craddock, Dimitris Samaras, Bertrand Thirion, Gaël Varoquaux

These R-fMRI pipelines build participant-specific connectomes from functionally-defined brain areas.

General Classification

Paper
Add Code

Recursive nearest agglomeration (ReNA): fast clustering for approximation of structured signals

1 code implementation • 15 Sep 2016 • Andrés Hoyos-Idrobo, Gaël Varoquaux, Jonas Kahn, Bertrand Thirion

Our goal is to summarize the data to decrease computational costs and memory footprint of subsequent analysis.

Clustering Denoising +1

Paper
Code

Social-sparsity brain decoders: faster spatial sparsity

no code implementations • 21 Jun 2016 • Gaël Varoquaux, Matthieu Kowalski, Bertrand Thirion

Spatially-sparse predictors are good models for brain decoding: they give accurate predictions and their weight maps are interpretable as they focus on a small number of regions.

Brain Decoding General Classification

Paper
Add Code

Assessing and tuning brain decoders: cross-validation, caveats, and guidelines

1 code implementation • 16 Jun 2016 • Gaël Varoquaux, Pradeep Reddy Raamana, Denis Engemann, Andrés Hoyos-Idrobo, Yannick Schwartz, Bertrand Thirion

Decoding, ie prediction from brain images or signals, calls for empirical evaluation of its predictive power.

Paper
Code

Learning to Discover Sparse Graphical Models

1 code implementation • ICML 2017 • Eugene Belilovsky, Kyle Kastner, Gaël Varoquaux, Matthew Blaschko

Learning this function brings two benefits: it implicitly models the desired structure or sparsity properties to form suitable priors, and it can be tailored to the specific problem of edge structure discovery, rather than maximizing data likelihood.

Paper
Code

Dictionary Learning for Massive Matrix Factorization

1 code implementation • 3 May 2016 • Arthur Mensch, Julien Mairal, Bertrand Thirion, Gaël Varoquaux

Sparse matrix factorization is a popular tool to obtain interpretable data decompositions, which are also effective to perform data completion or denoising.

Ranked #12 on Recommendation Systems on MovieLens 10M

Collaborative Filtering Dictionary Learning +2

133

Paper
Code

Compressed Online Dictionary Learning for Fast fMRI Decomposition

no code implementations • 8 Feb 2016 • Arthur Mensch, Gaël Varoquaux, Bertrand Thirion

We present a method for fast resting-state fMRI spatial decomposi-tions of very large datasets, based on the reduction of the temporal dimension before applying dictionary learning on concatenated individual records from groups of subjects.

Dictionary Learning

Paper
Add Code

Testing for Differences in Gaussian Graphical Models: Applications to Brain Connectivity

no code implementations • NeurIPS 2016 • Eugene Belilovsky, Gaël Varoquaux, Matthew B. Blaschko

We characterize the uncertainty of differences with confidence intervals obtained using a parametric distribution on parameters of a sparse estimator.

Paper
Add Code

FAASTA: A fast solver for total-variation regularization of ill-conditioned problems with application to brain imaging

no code implementations • 22 Dec 2015 • Gaël Varoquaux, Michael Eickenberg, Elvis Dohmatob, Bertand Thirion

The total variation (TV) penalty, as many other analysis-sparsity problems, does not lead to separable factors or a proximal operatorwith a closed-form expression, such as soft thresholding for the $\ell\_1$ penalty.

Brain Decoding

Paper
Add Code

Mapping cognitive ontologies to and from the brain

no code implementations • 15 Nov 2013 • Yannick Schwartz, Bertrand Thirion, Gaël Varoquaux

Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies.

Paper
Add Code

API design for machine learning software: experiences from the scikit-learn project

4 code implementations • 1 Sep 2013 • Lars Buitinck, Gilles Louppe, Mathieu Blondel, Fabian Pedregosa, Andreas Mueller, Olivier Grisel, Vlad Niculae, Peter Prettenhofer, Alexandre Gramfort, Jaques Grobler, Robert Layton, Jake Vanderplas, Arnaud Joly, Brian Holt, Gaël Varoquaux

Scikit-learn is an increasingly popular machine learning li- brary.

BIG-bench Machine Learning

Paper
Code

Scikit-learn: Machine Learning in Python

3 code implementations • 2 Jan 2012 • Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Andreas Müller, Joel Nothman, Gilles Louppe, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, Édouard Duchesnay

Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems.

BIG-bench Machine Learning Clustering +3

58,119

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.