Kernel Density Decision Trees

29 Sep 2021 · Jack Henry Good, Kyle Miller, Artur Dubrawski ·

We propose kernel density decision trees (KDDTs), a novel fuzzy decision tree (FDT) formalism based on kernel density estimation that achieves state-of-the-art prediction performance often matching or exceeding that of conventional tree ensembles. Ensembles of KDDTs achieve even better generalization. FDTs address the sensitivity and tendency to overfitting of decision trees by representing uncertainty through fuzzy partitions. However, compared to conventional, crisp decision trees, FDTs are generally complex to apply, sensitive to design choices, slow to fit and make predictions, and difficult to interpret. Moreover, finding the optimal threshold for a given fuzzy split is challenging, resulting in methods that discretize data, settle for near-optimal thresholds, or fuzzify crisp trees. Our KDDTs address these shortcomings using a fast algorithm for finding optimal partitions for FDTs with piecewise-linear splitting functions or KDDTs with piecewise-constant fitting kernels. Prediction can take place with or without fuzziness; without it, KDDTs are identical to standard decision trees, but with a more robust fitting algorithm. Using KDDTs simplifies the process of fitting a model, grounds design choices in the well-studied theory of density estimation, supports optional incorporation of expert knowledge about uncertainty in the data, and enables interpretation in the context of kernels. We demonstrate prediction performance against conventional decision trees and tree ensembles on 12 publicly available datasets.

PDF Abstract