1 code implementation • 6 Feb 2024 • Abdoulaye Sakho, Emmanuel Malherbe, Erwan Scornet
Surprisingly, for most data sets, we observe that applying no rebalancing strategy is competitive in terms of predictive performances, with tuned random forests, logistic regression or LightGBM.
no code implementations • 6 Feb 2024 • Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet
Constant (naive) imputation is still widely used in practice as this is a first easy-to-use technique to deal with missing data.
no code implementations • 30 Sep 2022 • Patrick Lutz, Ludovic Arnould, Claire Boyer, Erwan Scornet
Dedicated neural network (NN) architectures have been designed to handle specific data types (such as CNN for images or RNN for text), which ranks them among state-of-the-art methods for dealing with these data.
no code implementations • 3 Feb 2022 • Alexis Ayme, Claire Boyer, Aymeric Dieuleveut, Erwan Scornet
Missing values arise in most real-world data sets due to the aggregation of multiple sources and intrinsically missing information (sensor failure, unanswered questions in surveys...).
no code implementations • NeurIPS 2021 • Marine Le Morvan, Julie Josse, Erwan Scornet, Gael Varoquaux
In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn.
1 code implementation • 1 Jun 2021 • Marine Le Morvan, Julie Josse, Erwan Scornet, Gaël Varoquaux
In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn.
1 code implementation • 25 May 2021 • Clément Bénard, Gérard Biau, Sébastien da Veiga, Erwan Scornet
Interpretability of learning algorithms is crucial for applications involving critical decisions, and variable importance is one of the main interpretation tools.
no code implementations • 26 Feb 2021 • Clément Bénard, Sébastien da Veiga, Erwan Scornet
Variable importance measures are the main tools to analyze the black-box mechanisms of random forests.
no code implementations • NeurIPS 2020 • Marine Le Morvan, Julie Josses, Thomas Moreau, Erwan Scornet, Gael Varoquaux
We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.
no code implementations • 29 Oct 2020 • Ludovic Arnould, Claire Boyer, Erwan Scornet, Sorbonne Lpsm
Random forests on the one hand, and neural networks on the other hand, have met great success in the machine learning community for their predictive performance.
no code implementations • 3 Jul 2020 • Marine Le Morvan, Julie Josse, Thomas Moreau, Erwan Scornet, Gaël Varoquaux
We provide an upper bound on the Bayes risk of NeuMiss networks, and show that they have good predictive accuracy with both a number of parameters and a computational complexity independent of the number of missing data patterns.
no code implementations • 29 Apr 2020 • Clément Bénard, Gérard Biau, Sébastien da Veiga, Erwan Scornet
We introduce SIRUS (Stable and Interpretable RUle Set) for regression, a stable rule learning algorithm which takes the form of a short and simple list of rules.
1 code implementation • 3 Feb 2020 • Marine Le Morvan, Nicolas Prost, Julie Josse, Erwan Scornet, Gaël Varoquaux
In the particular Gaussian case, it can be written as a linear function of multiway interactions between the observed data and the various missing-value indicators.
no code implementations • 13 Jan 2020 • Erwan Scornet
We prove that if input variables are independent and in absence of interactions, MDI provides a variance decomposition of the output, where the contribution of each variable is clearly identified.
no code implementations • 19 Aug 2019 • Clément Bénard, Gérard Biau, Sébastien da Veiga, Erwan Scornet
State-of-the-art learning algorithms, such as random forests or neural networks, are often qualified as "black-boxes" because of the high number and complexity of operations involved in their prediction mechanism.
2 code implementations • 25 Jun 2019 • Jaouad Mourtada, Stéphane Gaïffas, Erwan Scornet
Using a variant of the Context Tree Weighting algorithm, we show that it is possible to efficiently perform an exact aggregation over all prunings of the trees; in particular, this enables to obtain a truly online parameter-free algorithm which is competitive with the optimal pruning of the Mondrian tree, and thus adaptive to the unknown regularity of the regression function.
3 code implementations • 19 Feb 2019 • Julie Josse, Jacob M. Chen, Nicolas Prost, Erwan Scornet, Gaël Varoquaux
A striking result is that the widely-used method of imputing with a constant, such as the mean prior to learning is consistent when missing values are not informative.
no code implementations • 15 Mar 2018 • Jaouad Mourtada, Stéphane Gaïffas, Erwan Scornet
Our results include consistency and convergence rates for Mondrian Trees and Forests, that turn out to be minimax optimal on the set of $s$-H\"older function with $s \in (0, 1]$ (for trees and forests) and $s \in (1, 2]$ (for forests only), assuming a proper tuning of their complexity parameter in both cases.
no code implementations • NeurIPS 2017 • Jaouad Mourtada, Stéphane Gaïffas, Erwan Scornet
We establish the consistency of an algorithm of Mondrian Forests, a randomized classification algorithm that can be implemented online.
2 code implementations • 25 Apr 2016 • Gérard Biau, Erwan Scornet, Johannes Welbl
Given an ensemble of randomized regression trees, it is possible to restructure them as a collection of multilayered neural networks with particular connection weights.
1 code implementation • 14 Mar 2016 • Roxane Duroux, Erwan Scornet
Random forests are ensemble learning methods introduced by Breiman (2001) that operate by averaging several decision trees built on a randomly selected subspace of the data set.
Statistics Theory Statistics Theory
no code implementations • 18 Nov 2015 • Gérard Biau, Erwan Scornet
The random forest algorithm, proposed by L. Breiman in 2001, has been extremely successful as a general-purpose classification and regression method.
no code implementations • 12 Feb 2015 • Erwan Scornet
In particular, we show that by slightly modifying their definition, random forests can be rewrit-ten as kernel methods (called KeRF for Kernel based on Random Forests) which are more interpretable and easier to analyze.
Statistics Theory Statistics Theory
no code implementations • 7 Sep 2014 • Erwan Scornet
The last decade has witnessed a growing interest in random forest models which are recognized to exhibit good practical performance, especially in high-dimensional settings.
Statistics Theory Statistics Theory
no code implementations • 12 May 2014 • Erwan Scornet, Gérard Biau, Jean-Philippe Vert
What has greatly contributed to the popularity of forests is the fact that they can be applied to a wide range of prediction problems and have few parameters to tune.