Split Regression Modeling

13 Dec 2018  ·  Anthony Christidis, Stefan Van Aelst, Ruben Zamar ·

Sparse methods are the standard approach to obtain interpretable models with high prediction accuracy. Alternatively, algorithmic ensemble methods can achieve higher prediction accuracy at the cost of loss of interpretability. However, the use of blackbox methods has been heavily criticized for high-stakes decisions and it has been argued that there does not have to be a trade-off between accuracy and interpretability. To combine high accuracy with interpretability, we generalize best subset selection to best split selection. Best split selection constructs a small number of sparse models learned jointly from the data which are then combined in an ensemble. Best split selection determines the models by splitting the available predictor variables among the different models when fitting the data. The proposed methodology results in an ensemble of sparse and diverse models that each provide a possible explanation for the relationship between the predictors and the response. The high computational cost of best split selection motivates the need for computational tractable approximations. We evaluate a method developed by Christidis et al. (2020) which can be seen as a multi-convex relaxation of best split selection.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper