Search Results for author: Gael Varoquaux

Found 18 papers, 8 papers with code

Imputing Out-of-Vocabulary Embeddings with LOVE Makes LanguageModels Robust with Little Cost

1 code implementation ACL 2022 Lihu Chen, Gael Varoquaux, Fabian Suchanek

State-of-the-art NLP systems represent inputs with word embeddings, but these are brittle when faced with Out-of-Vocabulary (OOV) words. To address this issue, we follow the principle of mimick-like models to generate vectors for unseen words, by learning the behavior of pre-trained embeddings using only the surface form of words. We present a simple contrastive learning framework, LOVE, which extends the word representation of an existing pre-trained language model (such as BERT) and makes it robust to OOV with few additional parameters. Extensive evaluations demonstrate that our lightweight model achieves similar or even better performances than prior competitors, both on original datasets and on corrupted variants.

Contrastive Learning Language Modelling +1

Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

1 code implementation9 Feb 2024 Riccardo Cappuzzo, Gael Varoquaux, Aimee Coelho, Paolo Papotti

We present an in-depth analysis of data discovery in data lakes, focusing on table augmentation for given machine learning tasks.

Benchmarking

The Past, Present, and Future of the Brain Imaging Data Structure (BIDS)

no code implementations11 Sep 2023 Russell A. Poldrack, Christopher J. Markiewicz, Stefan Appelhoff, Yoni K. Ashar, Tibor Auer, Sylvain Baillet, Shashank Bansal, Leandro Beltrachini, Christian G. Benar, Giacomo Bertazzoli, Suyash Bhogawar, Ross W. Blair, Marta Bortoletto, Mathieu Boudreau, Teon L. Brooks, Vince D. Calhoun, Filippo Maria Castelli, Patricia Clement, Alexander L Cohen, Julien Cohen-Adad, Sasha D'Ambrosio, Gilles de Hollander, María de la iglesia-Vayá, Alejandro de la Vega, Arnaud Delorme, Orrin Devinsky, Dejan Draschkow, Eugene Paul Duff, Elizabeth Dupre, Eric Earl, Oscar Esteban, Franklin W. Feingold, Guillaume Flandin, anthony galassi, Giuseppe Gallitto, Melanie Ganz, Rémi Gau, James Gholam, Satrajit S. Ghosh, Alessio Giacomel, Ashley G Gillman, Padraig Gleeson, Alexandre Gramfort, Samuel Guay, Giacomo Guidali, Yaroslav O. Halchenko, Daniel A. Handwerker, Nell Hardcastle, Peer Herholz, Dora Hermes, Christopher J. Honey, Robert B. Innis, Horea-Ioan Ioanas, Andrew Jahn, Agah Karakuzu, David B. Keator, Gregory Kiar, Balint Kincses, Angela R. Laird, Jonathan C. Lau, Alberto Lazari, Jon Haitz Legarreta, Adam Li, Xiangrui Li, Bradley C. Love, Hanzhang Lu, Camille Maumet, Giacomo Mazzamuto, Steven L. Meisler, Mark Mikkelsen, Henk Mutsaerts, Thomas E. Nichols, Aki Nikolaidis, Gustav Nilsonne, Guiomar Niso, Martin Norgaard, Thomas W Okell, Robert Oostenveld, Eduard Ort, Patrick J. Park, Mateusz Pawlik, Cyril R. Pernet, Franco Pestilli, Jan Petr, Christophe Phillips, Jean-Baptiste Poline, Luca Pollonini, Pradeep Reddy Raamana, Petra Ritter, Gaia Rizzo, Kay A. Robbins, Alexander P. Rockhill, Christine Rogers, Ariel Rokem, Chris Rorden, Alexandre Routier, Jose Manuel Saborit-Torres, Taylor Salo, Michael Schirner, Robert E. Smith, Tamas Spisak, Julia Sprenger, Nicole C. Swann, Martin Szinte, Sylvain Takerkart, Bertrand Thirion, Adam G. Thomas, Sajjad Torabian, Gael Varoquaux, Bradley Voytek, Julius Welzel, Martin Wilson, Tal Yarkoni, Krzysztof J. Gorgolewski

The Brain Imaging Data Structure (BIDS) is a community-driven standard for the organization of data and metadata from a growing range of neuroscience modalities.

Why do tree-based models still outperform deep learning on typical tabular data?

1 code implementation NeurIPS 2022 Leo Grinsztajn, Edouard Oyallon, Gael Varoquaux

While deep learning has enabled tremendous progress on text and image datasets, its superiority on tabular data is not clear.

Benchmarking

What’s a good imputation to predict with missing values?

no code implementations NeurIPS 2021 Marine Le Morvan, Julie Josse, Erwan Scornet, Gael Varoquaux

In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn.

Imputation regression

AI as statistical methods for imperfect theories

no code implementations NeurIPS Workshop AI4Scien 2021 Gael Varoquaux

Science has progressed by reasoning on what models could not predict because they were missing important ingredients.

BIG-bench Machine Learning

NeuMiss networks: differentiable programming for supervised learning with missing values.

no code implementations NeurIPS 2020 Marine Le Morvan, Julie Josses, Thomas Moreau, Erwan Scornet, Gael Varoquaux

We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.

Imputation

Comparing distributions: \ell_1 geometry improves kernel two-sample testing

1 code implementation NeurIPS 2019 Meyer Scetbon, Gael Varoquaux

Here, we show that $L^p$ distances (with $p\geq 1$) between these distribution representatives give metrics on the space of distributions that are well-behaved to detect differences between distributions as they metrize the weak convergence.

Two-sample testing Vocal Bursts Valence Prediction

Computational and informatics advances for reproducible data analysis in neuroimaging

no code implementations24 Sep 2018 Russell A. Poldrack, Krzysztof J. Gorgolewski, Gael Varoquaux

We argue that openness and transparency are critical for reproducibility, and we outline an ecosystem for open and transparent science that has emerged within the human neuroimaging community.

Feature Grouping as a Stochastic Regularizer for High-Dimensional Structured Data

1 code implementation31 Jul 2018 Sergul Aydore, Bertrand Thirion, Gael Varoquaux

In many applications where collecting data is expensive, for example neuroscience or medical imaging, the sample size is typically small compared to the feature dimension.

Clustering Denoising +2

Stochastic Subsampling for Factorizing Huge Matrices

1 code implementation19 Jan 2017 Arthur Mensch, Julien Mairal, Bertrand Thirion, Gael Varoquaux

We present a matrix-factorization algorithm that scales to input matrices with both huge number of rows and columns.

Dictionary Learning

Learning brain regions via large-scale online structured sparse dictionary learning

no code implementations NeurIPS 2016 Elvis Dohmatob, Arthur Mensch, Gael Varoquaux, Bertrand Thirion

We propose a multivariate online dictionary-learning method for obtaining decompositions of brain images with structured and sparse components (aka atoms).

Dictionary Learning

Fast clustering for scalable statistical analysis on structured images

no code implementations16 Nov 2015 Bertrand Thirion, Andrés Hoyos-Idrobo, Jonas Kahn, Gael Varoquaux

The use of brain images as markers for diseases or behavioral differences is challenged by the small effects size and the ensuing lack of power, an issue that has incited researchers to rely more systematically on large cohorts.

Clustering Computational Efficiency +1

Region segmentation for sparse decompositions: better brain parcellations from rest fMRI

no code implementations12 Dec 2014 Alexandre Abraham, Elvis Dohmatob, Bertrand Thirion, Dimitris Samaras, Gael Varoquaux

Functional Magnetic Resonance Images acquired during resting-state provide information about the functional organization of the brain through measuring correlations between brain areas.

Mapping paradigm ontologies to and from the brain

no code implementations NeurIPS 2013 Yannick Schwartz, Bertrand Thirion, Gael Varoquaux

Imaging neuroscience links brain activation maps to behavior and cognition via correlational studies.

Brain covariance selection: better individual functional connectivity models using population prior

no code implementations NeurIPS 2010 Gael Varoquaux, Alexandre Gramfort, Jean-Baptiste Poline, Bertrand Thirion

We describe subject-level brain functional connectivity structure as a multivariate Gaussian process and introduce a new strategy to estimate it from group data, by imposing a common structure on the graphical model in the population.

Cannot find the paper you are looking for? You can Submit a new open access paper.