Search Results for author: Lucas Mentch

Found 15 papers, 7 papers with code

Trees, Forests, Chickens, and Eggs: When and Why to Prune Trees in a Random Forest

no code implementations30 Mar 2021 Siyu Zhou, Lucas Mentch

Due to their long-standing reputation as excellent off-the-shelf predictors, random forests continue remain a go-to model of choice for applied statisticians and data scientists.

Bridging Breiman's Brook: From Algorithmic Modeling to Statistical Learning

no code implementations23 Feb 2021 Lucas Mentch, Giles Hooker

In 2001, Leo Breiman wrote of a divide between "data modeling" and "algorithmic modeling" cultures.

Philosophy

Posterior Calibrated Training on Sentence Classification Tasks

1 code implementation ACL 2020 Taehee Jung, Dongyeop Kang, Hua Cheng, Lucas Mentch, Thomas Schaaf

Here we propose an end-to-end training procedure called posterior calibrated (PosCal) training that directly optimizes the objective while minimizing the difference between the predicted and empirical posterior probabilities. We show that PosCal not only helps reduce the calibration error but also improve task performance by penalizing drops in performance of both objectives.

Classification General Classification +2

Getting Better from Worse: Augmented Bagging and a Cautionary Tale of Variable Importance

no code implementations7 Mar 2020 Lucas Mentch, Siyu Zhou

As the size, complexity, and availability of data continues to grow, scientists are increasingly relying upon black-box learning algorithms that can often provide accurate predictions with minimal a priori model specifications.

$V$-statistics and Variance Estimation

1 code implementation2 Dec 2019 Zhengze Zhou, Lucas Mentch, Giles Hooker

This paper develops a general framework for analyzing asymptotics of $V$-statistics.

Randomization as Regularization: A Degrees of Freedom Explanation for Random Forest Success

1 code implementation1 Nov 2019 Lucas Mentch, Siyu Zhou

Random forests remain among the most popular off-the-shelf supervised machine learning tools with a well-established track record of predictive accuracy in both regression and classification settings.

regression

Earlier Isn't Always Better: Sub-aspect Analysis on Corpus and System Biases in Summarization

1 code implementation IJCNLP 2019 Taehee Jung, Dongyeop Kang, Lucas Mentch, Eduard Hovy

We find that while position exhibits substantial bias in news articles, this is not the case, for example, with academic papers and meeting minutes.

News Summarization Position

Locally Optimized Random Forests

1 code implementation27 Aug 2019 Tim Coleman, Kimberly Kaufeld, Mary Frances Dorn, Lucas Mentch

To estimate these ratios with an unlabeled test set, we make the covariate shift assumption, where the differences in distribution are only a function of the training distributions (Shimodaira, 2000.)

Asymptotic Distributions and Rates of Convergence for Random Forests via Generalized U-statistics

no code implementations25 May 2019 Wei Peng, Tim Coleman, Lucas Mentch

Random forests remain among the most popular off-the-shelf supervised learning algorithms.

Unrestricted Permutation forces Extrapolation: Variable Importance Requires at least One More Model, or There Is No Free Variable Importance

1 code implementation1 May 2019 Giles Hooker, Lucas Mentch, Siyu Zhou

This paper reviews and advocates against the use of permute-and-predict (PaP) methods for interpreting black box functions.

Scalable and Efficient Hypothesis Testing with Random Forests

2 code implementations16 Apr 2019 Tim Coleman, Wei Peng, Lucas Mentch

Throughout the last decade, random forests have established themselves as among the most accurate and popular supervised learning methods.

Two-sample testing

Bootstrap Bias Corrections for Ensemble Methods

no code implementations1 Jun 2015 Giles Hooker, Lucas Mentch

This paper examines the use of a residual bootstrap for bias correction in machine learning regression methods.

BIG-bench Machine Learning regression

Formal Hypothesis Tests for Additive Structure in Random Forests

no code implementations7 Jun 2014 Lucas Mentch, Giles Hooker

While statistical learning methods have proved powerful tools for predictive modeling, the black-box nature of the models they produce can severely limit their interpretability and the ability to conduct formal inference.

Quantifying Uncertainty in Random Forests via Confidence Intervals and Hypothesis Tests

no code implementations25 Apr 2014 Lucas Mentch, Giles Hooker

Instead of aggregating full bootstrap samples, we consider predicting by averaging over trees built on subsamples of the training set and demonstrate that the resulting estimator takes the form of a U-statistic.

Cannot find the paper you are looking for? You can Submit a new open access paper.