Mako: Semi-supervised continual learning with minimal labeled data via data programming

29 Sep 2021 · Pengyuan Lu, Seungwon Lee, Amanda Watson, David Kent, Insup Lee, Eric Eaton, James Weimer ·

Lifelong machine learning (LML) is a well-known paradigm mimicking the human learning process by utilizing experiences from previous tasks. Nevertheless, an issue that has been rarely addressed is the lack of labels at the individual task level. The state-of-the-art of LML largely addresses supervised learning, with a few semi-supervised continual learning exceptions which require training additional models, which in turn impose constraints on the LML methods themselves. Therefore, we propose Mako, a wrapper tool that mounts on top of supervised LML frameworks, leveraging data programming. Mako imposes no additional knowledge base overhead and enables continual semi-supervised learning with a limited amount of labeled data. This tool achieves similar performance, in terms of per-task accuracy and resistance to catastrophic forgetting, as compared to fully labeled data. We ran extensive experiments on LML task sequences created from standard image classification data sets including MNIST, CIFAR-10 and CIFAR-100, and the results show that after utilizing Mako to leverage unlabeled data, LML tools are able to achieve $97\%$ performance of supervised learning on fully labeled data in terms of accuracy and catastrophic forgetting prevention. Moreover, when compared to baseline semi-supervised LML tools such as CNNL, ORDisCo and DistillMatch, Mako significantly outperforms them, increasing accuracy by $0.25$ on certain benchmarks.

PDF Abstract