ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring

We improve the recently-proposed "MixMatch" semi-supervised learning algorithm by introducing two new techniques: distribution alignment and augmentation anchoring. Distribution alignment encourages the marginal distribution of predictions on unlabeled data to be close to the marginal distribution of ground-truth labels. Augmentation anchoring feeds multiple strongly augmented versions of an input into the model and encourages each output to be close to the prediction for a weakly-augmented version of the same input. To produce strong augmentations, we propose a variant of AutoAugment which learns the augmentation policy while the model is being trained. Our new algorithm, dubbed ReMixMatch, is significantly more data-efficient than prior work, requiring between $5\times$ and $16\times$ less data to reach the same accuracy. For example, on CIFAR-10 with 250 labeled examples we reach $93.73\%$ accuracy (compared to MixMatch's accuracy of $93.58\%$ with $4{,}000$ examples) and a median accuracy of $84.92\%$ with just four labels per class. We make our code and data open-source at https://github.com/google-research/remixmatch.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Semi-Supervised Image Classification CIFAR-100, 2500 Labels ReMixMatch Percentage error 27.43±0.31 # 5
Semi-Supervised Image Classification CIFAR-100, 400 Labels ReMixMatch Percentage error 44.28±2.06 # 6
Semi-Supervised Image Classification cifar10, 250 Labels ReMixMatch Percentage correct 93.73 # 1
Semi-Supervised Image Classification CIFAR-10, 250 Labels ReMixMatch Percentage error 6.27 # 9
Semi-Supervised Image Classification CIFAR-10, 4000 Labels ReMixMatch Percentage error 5.14 # 16
Semi-Supervised Image Classification CIFAR-10, 40 Labels ReMixMatch Percentage error 19.10 # 10
Image Classification STL-10 SWWAE Percentage correct 74.30 # 74
Image Classification STL-10 MixMatch Percentage correct 89.82 # 32
Image Classification STL-10 ReMixMatch (K=1) Percentage correct 93.23 # 23
Image Classification STL-10 ReMixMatch (K=4) Percentage correct 93.82 # 21
Image Classification STL-10 CC-GAN Percentage correct 77.80 # 65
Semi-Supervised Image Classification STL-10, 1000 Labels ReMixMatch Accuracy 93.82 # 4
Semi-Supervised Image Classification SVHN, 1000 labels ReMixMatch Accuracy 97.17 # 6

Methods