Big Self-Supervised Models are Strong Semi-Supervised Learners

One paradigm for learning from few labeled examples while making best use of a large amount of unlabeled data is unsupervised pretraining followed by supervised fine-tuning. Although this paradigm uses unlabeled data in a task-agnostic way, in contrast to common approaches to semi-supervised learning for computer vision, we show that it is surprisingly effective for semi-supervised learning on ImageNet... A key ingredient of our approach is the use of big (deep and wide) networks during pretraining and fine-tuning. We find that, the fewer the labels, the more this approach (task-agnostic use of unlabeled data) benefits from a bigger network. After fine-tuning, the big network can be further improved and distilled into a much smaller one with little loss in classification accuracy by using the unlabeled examples for a second time, but in a task-specific way. The proposed semi-supervised learning algorithm can be summarized in three steps: unsupervised pretraining of a big ResNet model using SimCLRv2, supervised fine-tuning on a few labeled examples, and distillation with unlabeled examples for refining and transferring the task-specific knowledge. This procedure achieves 73.9% ImageNet top-1 accuracy with just 1% of the labels ($\le$13 labeled images per class) using ResNet-50, a $10\times$ improvement in label efficiency over the previous state-of-the-art. With 10% of labels, ResNet-50 trained with our method achieves 77.5% top-1 accuracy, outperforming standard supervised training with all of the labels. read more

PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Self-Supervised Image Classification ImageNet SimCLRv2 (ResNet-152 x3, SK) Top 1 Accuracy 79.8% # 8
Top 5 Accuracy 94.9% # 2
Number of Params 795M # 3
Self-Supervised Image Classification ImageNet SimCLRv2 (ResNet-50 x2) Top 1 Accuracy 75.6% # 32
Top 5 Accuracy 92.7% # 9
Number of Params 94M # 19
Self-Supervised Image Classification ImageNet SimCLRv2 (ResNet-50) Top 1 Accuracy 71.7% # 54
Top 5 Accuracy 90.4% # 19
Number of Params 24M # 40
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 distilled (ResNet-50 x2, SK) Top 5 Accuracy 95.0% # 2
Top 1 Accuracy 80.2% # 2
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 distilled (ResNet-50) Top 5 Accuracy 93.4% # 4
Top 1 Accuracy 77.5% # 5
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 self-distilled (ResNet-152 x3, SK) Top 5 Accuracy 95.5% # 1
Top 1 Accuracy 80.9% # 1
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 (ResNet-152 x3, SK) Top 5 Accuracy 95.0% # 2
Top 1 Accuracy 80.1% # 3
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 (ResNet-50) Top 5 Accuracy 89.2% # 20
Top 1 Accuracy 68.4% # 18
Semi-Supervised Image Classification ImageNet - 10% labeled data SimCLRv2 (ResNet-50 x2) Top 5 Accuracy 91.9% # 7
Top 1 Accuracy 73.9% # 10
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 self-distilled (ResNet-152 x3, SK) Top 5 Accuracy 93.4% # 1
Top 1 Accuracy 76.6% # 1
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 distilled (ResNet-50 x2, SK) Top 5 Accuracy 93.0% # 2
Top 1 Accuracy 75.9% # 2
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 distilled (ResNet-50) Top 5 Accuracy 91.5% # 4
Top 1 Accuracy 73.9% # 4
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 (ResNet-50) Top 5 Accuracy 82.5% # 16
Top 1 Accuracy 57.9% # 16
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 (ResNet-152 x3, SK) Top 5 Accuracy 92.3% # 3
Top 1 Accuracy 74.9% # 3
Semi-Supervised Image Classification ImageNet - 1% labeled data SimCLRv2 (ResNet-50 ×2) Top 5 Accuracy 87.4% # 8
Top 1 Accuracy 66.3% # 9
Self-Supervised Image Classification ImageNet (finetuned) SimCLRv2 (ResNet-152, 3×+SK) Number of Params 795M # 1
Top 1 Accuracy 83.1% # 7

Methods