Automatically Optimized Gradient Boosting Trees for Classifying Large Volume High Cardinality Data Streams Under Concept Drift

30 Nov 2019 · Jobin Wilson, Amit Kumar Meher, Bivin Vinodkumar Bindu, Santanu Chaudhury, Brejesh lall, Manoj Sharma, Vishakha Pareek ·

Data abundance along with scarcity of machine learning experts and domain specialists necessitates progressive automation of end-to-end machine learning workflows. To this end, Automated Machine Learning (AutoML) has emerged as a prominent research area. Real world data often arrives as streams or batches, and data distribution evolves over time causing concept drift. Models need to handle data that is not independent and identically distributed (iid), and transfer knowledge across time through continuous self-evaluation and adaptation adhering to resource constraints. Creating autonomous self-maintaining models which not only discover an optimal pipeline, but also automatically adapt to concept drift to operate in a lifelong learning setting was the crux of NeurIPS 2018 AutoML challenge. We describe our winning solution to the challenge, entitled AutoGBT, which combines an adaptive self-optimized end-to-end machine learning pipeline based on gradient boosting trees with automatic hyper-parameter tuning using Sequential Model-Based Optimization (SMBO). We report experimental results on the challenge datasets as well as several benchmark datasets affected by concept drift and compare it with the baseline model for the challenge and Auto-sklearn. Results indicate the effectiveness of the proposed methodology in this context.

PDF