Boosting the Performance of Semi-Supervised Learning with Unsupervised Clustering

1 Dec 2020  ·  Boaz Lerner, Guy Shiran, Daphna Weinshall ·

Recently, Semi-Supervised Learning (SSL) has shown much promise in leveraging unlabeled data while being provided with very few labels. In this paper, we show that ignoring the labels altogether for whole epochs intermittently during training can significantly improve performance in the small sample regime. More specifically, we propose to train a network on two tasks jointly. The primary classification task is exposed to both the unlabeled and the scarcely annotated data, whereas the secondary task seeks to cluster the data without any labels. As opposed to hand-crafted pretext tasks frequently used in self-supervision, our clustering phase utilizes the same classification network and head in an attempt to relax the primary task and propagate the information from the labels without overfitting them. On top of that, the self-supervised technique of classifying image rotations is incorporated during the unsupervised learning phase to stabilize training. We demonstrate our method's efficacy in boosting several state-of-the-art SSL algorithms, significantly improving their results and reducing running time in various standard semi-supervised benchmarks, including 92.6% accuracy on CIFAR-10 and 96.9% on SVHN, using only 4 labels per class in each task. We also notably improve the results in the extreme cases of 1,2 and 3 labels per class, and show that features learned by our model are more meaningful for separating the data.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Semi-Supervised Image Classification cifar-10, 10 Labels Semi-MMDC Accuracy (Test) 70.84±8.1 # 3
Semi-Supervised Image Classification CIFAR-10, 20 Labels Semi-MMDC Percentage error 28.1±5.5 # 3
Semi-Supervised Image Classification CIFAR-10, 250 Labels Semi-MMDC Percentage error 5.51±0.25 # 13
Semi-Supervised Image Classification CIFAR-10, 40 Labels Semi-MMDC Percentage error 7.39±0.61 # 14
Semi-Supervised Image Classification STL-10, 1000 Labels Semi-MMDC Accuracy 95.22±0.29 # 2
Semi-Supervised Image Classification SVHN, 250 Labels Semi-MMDC Accuracy 97.7±0.03 # 2
Semi-Supervised Image Classification SVHN, 40 Labels Semi-MMDC Percentage error 3.09±0.54 # 2

Methods


No methods listed for this paper. Add relevant methods here