Automatically Optimized Gradient Boosting Trees for Classifying Large Volume High Cardinality Data Streams Under Concept Drift

Data abundance along with scarcity of machine learning experts and domain specialists necessitates progressive automation of end-to-end machine learning workflows. To this end, Automated Machine Learning (AutoML) has emerged as a prominent research area. Real world data often arrives as streams or batches, and data distribution evolves over time causing concept drift. Models need to handle data that is not independent and identically distributed (iid), and transfer knowledge across time through continuous self-evaluation and adaptation adhering to resource constraints. Creating autonomous self-maintaining models which not only discover an optimal pipeline, but also automatically adapt to concept drift to operate in a lifelong learning setting was the crux of NeurIPS 2018 AutoML challenge. We describe our winning solution to the challenge, entitled AutoGBT, which combines an adaptive self-optimized end-to-end machine learning pipeline based on gradient boosting trees with automatic hyper-parameter tuning using Sequential Model-Based Optimization (SMBO). We report experimental results on the challenge datasets as well as several benchmark datasets affected by concept drift and compare it with the baseline model for the challenge and Auto-sklearn. Results indicate the effectiveness of the proposed methodology in this context.

PDF

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here