Search Results for author: Erwan Scornet

Found 25 papers, 8 papers with code

Do we need rebalancing strategies? A theoretical and empirical study around SMOTE and its variants

1 code implementation6 Feb 2024 Abdoulaye Sakho, Emmanuel Malherbe, Erwan Scornet

Surprisingly, for most data sets, we observe that applying no rebalancing strategy is competitive in terms of predictive performances, with tuned random forests, logistic regression or LightGBM.

Random features models: a way to study the success of naive imputation

no code implementations6 Feb 2024 Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet

Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data.

Imputation

Sparse tree-based initialization for neural networks

no code implementations30 Sep 2022 Patrick Lutz, Ludovic Arnould, Claire Boyer, Erwan Scornet

Dedicated neural network (NN) architectures have been designed to handle specific data types (such as CNN for images or RNN for text), which ranks them among state-of-the-art methods for dealing with these data.

valid

Minimax rate of consistency for linear models with missing values

no code implementations3 Feb 2022 Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet

Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...).

Missing Values

What’s a good imputation to predict with missing values?

no code implementations NeurIPS 2021 Marine Le Morvan, Julie Josse, Erwan Scornet, Gael Varoquaux

In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn.

Imputation Missing Values +1

What's a good imputation to predict with missing values?

1 code implementation1 Jun 2021 Marine Le Morvan, Julie Josse, Erwan Scornet, Gaël Varoquaux

In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn.

Imputation Missing Values +1

SHAFF: Fast and consistent SHApley eFfect estimates via random Forests

1 code implementation25 May 2021 Clément Bénard, Gérard Biau, Sébastien da Veiga, Erwan Scornet

Interpretability of learning algorithms is crucial for applications involving critical decisions, and variable importance is one of the main interpretation tools.

MDA for random forests: inconsistency, and a practical solution via the Sobol-MDA

no code implementations26 Feb 2021 Clément Bénard, Sébastien da Veiga, Erwan Scornet

Variable importance measures are the main tools to analyze the black-box mechanisms of random forests.

Variable Selection

NeuMiss networks: differentiable programming for supervised learning with missing values.

no code implementations NeurIPS 2020 Marine Le Morvan, Julie Josses, Thomas Moreau, Erwan Scornet, Gael Varoquaux

We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.

Imputation Missing Values

Analyzing the tree-layer structure of Deep Forests

no code implementations29 Oct 2020 Ludovic Arnould, Claire Boyer, Erwan Scornet, Sorbonne Lpsm

Random forests on the one hand, and neural networks on the other hand, have met great success in the machine learning community for their predictive performance.

NeuMiss networks: differentiable programming for supervised learning with missing values

no code implementations3 Jul 2020 Marine Le Morvan, Julie Josse, Thomas Moreau, Erwan Scornet, Gaël Varoquaux

We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.

Imputation Missing Values

Interpretable Random Forests via Rule Extraction

no code implementations29 Apr 2020 Clément Bénard, Gérard Biau, Sébastien da Veiga, Erwan Scornet

We introduce SIRUS (Stable and Interpretable RUle Set) for regression, a stable rule learning algorithm which takes the form of a short and simple list of rules.

Linear predictor on linearly-generated data with missing values: non consistency and solutions

1 code implementation3 Feb 2020 Marine Le Morvan, Nicolas Prost, Julie Josse, Erwan Scornet, Gaël Varoquaux

In the particular Gaussian case, it can be written as a linear function of multiway interactions between the observed data and the various missing-value indicators.

Generalization Bounds Missing Values

Trees, forests, and impurity-based variable importance

no code implementations13 Jan 2020 Erwan Scornet

We prove that if input variables are independent and in absence of interactions, MDI provides a variance decomposition of the output, where the contribution of each variable is clearly identified.

Decision Making

SIRUS: Stable and Interpretable RUle Set for Classification

no code implementations19 Aug 2019 Clément Bénard, Gérard Biau, Sébastien da Veiga, Erwan Scornet

State-of-the-art learning algorithms, such as random forests or neural networks, are often qualified as "black-boxes" because of the high number and complexity of operations involved in their prediction mechanism.

Classification General Classification

AMF: Aggregated Mondrian Forests for Online Learning

2 code implementations25 Jun 2019 Jaouad Mourtada, Stéphane Gaïffas, Erwan Scornet

Using a variant of the Context Tree Weighting algorithm, we show that it is possible to efficiently perform an exact aggregation over all prunings of the trees; in particular, this enables to obtain a truly online parameter-free algorithm which is competitive with the optimal pruning of the Mondrian tree, and thus adaptive to the unknown regularity of the regression function.

General Classification Multi-class Classification +1

On the consistency of supervised learning with missing values

3 code implementations19 Feb 2019 Julie Josse, Jacob M. Chen, Nicolas Prost, Erwan Scornet, Gaël Varoquaux

A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative.

Attribute Imputation +1

Minimax optimal rates for Mondrian trees and forests

no code implementations15 Mar 2018 Jaouad Mourtada, Stéphane Gaïffas, Erwan Scornet

Our results include consistency and convergence rates for Mondrian Trees and Forests, that turn out to be minimax optimal on the set of $s$-H\"older function with $s \in (0, 1]$ (for trees and forests) and $s \in (1, 2]$ (for forests only), assuming a proper tuning of their complexity parameter in both cases.

Universal consistency and minimax rates for online Mondrian Forests

no code implementations NeurIPS 2017 Jaouad Mourtada, Stéphane Gaïffas, Erwan Scornet

We establish the consistency of an algorithm of Mondrian Forests, a randomized classification algorithm that can be implemented online.

General Classification regression

Neural Random Forests

2 code implementations25 Apr 2016 Gérard Biau, Erwan Scornet, Johannes Welbl

Given an ensemble of randomized regression trees, it is possible to restructure them as a collection of multilayered neural networks with particular connection weights.

regression

Impact of subsampling and pruning on random forests

1 code implementation14 Mar 2016 Roxane Duroux, Erwan Scornet

Random forests are ensemble learning methods introduced by Breiman (2001) that operate by averaging several decision trees built on a randomly selected subspace of the data set.

Statistics Theory Statistics Theory

A Random Forest Guided Tour

no code implementations18 Nov 2015 Gérard Biau, Erwan Scornet

The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method.

General Classification

Random forests and kernel methods

no code implementations12 Feb 2015 Erwan Scornet

In particular, we show that by slightly modifying their definition, random forests can be rewrit-ten as kernel methods (called KeRF for Kernel based on Random Forests) which are more interpretable and easier to analyze.

Statistics Theory Statistics Theory

On the asymptotics of random forests

no code implementations7 Sep 2014 Erwan Scornet

The last decade has witnessed a growing interest in random forest models which are recognized to exhibit good practical performance, especially in high-dimensional settings.

Statistics Theory Statistics Theory

Consistency of random forests

no code implementations12 May 2014 Erwan Scornet, Gérard Biau, Jean-Philippe Vert

What has greatly contributed to the popularity of forests is the fact that they can be applied to a wide range of prediction problems and have few parameters to tune.

Ensemble Learning regression

Cannot find the paper you are looking for? You can Submit a new open access paper.