Stochastic Induction of Decision Trees with Application to Learning Haar Tree

29 Sep 2021  ·  Azar Alizadeh, Pooya Tavallali, Vahid Behzadan, Mukesh Singhal ·

Decision trees are a convenient and established approach for any supervised learning task. Decision trees are used in a broad range of applications from medical imaging to computer vision. Decision trees are trained by greedily splitting the leaf nodes into a split and two leaf nodes until a certain stopping criterion is reached. The procedure of splitting a node consists of finding the best feature and threshold that minimizes a criterion. The criterion minimization problem is solved through an exhaustive search algorithm. However, this exhaustive search algorithm is very expensive, especially, if the number of samples and features are high. In this paper, we propose a novel stochastic approach for the criterion minimization. Asymptotically, the proposed algorithm is faster than conventional exhaustive search by several orders of magnitude. It is further shown that the proposed approach minimizes an upper bound for the criterion. Experimentally, the algorithm is compared with several other related state-of-the-art decision tree learning methods, including the baseline non-stochastic approach. The proposed algorithm outperforms every other decision tree learning (including online and fast) approaches and performs as well as the baseline algorithm in terms of accuracy and computational cost, despite being non-deterministic. For empirical evaluation, we apply the proposed algorithm to learn a Haar tree over MNIST dataset that consists of over $200,000$ features and $60,000$ samples. This tree achieved a test accuracy of $94\%$ over MNIST which is $4\%$ higher than any other known axis-aligned tree. This result is comparable to the performance of oblique trees, while providing a significant speed-up at both inference and training times.

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods