Faster Neural Net Inference via Forests of Sparse Oblique Decision Trees

29 Sep 2021 · Yerlan Idelbayev, Arman Zharmagambetov, Magzhan Gabidolla, Miguel A. Carreira-Perpinan ·

It is widely established that large neural nets can be considerably compressed by techniques such as pruning, quantization or low-rank factorization. We show that neural nets can be further compressed by replacing layers of it with a special type of decision forest. This consists of sparse oblique trees, trained with the Tree Alternating Optimization (TAO) algorithm, using a teacher-student approach. We find we can replace the fully-connected and some convolutional layers of standard architectures with a decision forest containing very few, shallow trees so that the prediction accuracy is preserved or improved, but the number of parameters and especially the inference time is greatly reduced. For example, replacing last 7 layers of VGG16 with a single tree reduces the inference FLOPs by 7440$\times$ with a marginal increase in the test error, and a boosted ensemble of nine trees can match the network's performance while still reducing the FLOPs 6289$\times$. The idea is orthogonal to other compression approaches, which can also be used on other parts of the net not being replaced by a forest.

PDF Abstract