Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?

Despite recent progress made by self-supervised methods in representation learning with residual networks, they still underperform supervised learning on the ImageNet classification benchmark, limiting their applicability in performance-critical settings. Building on prior theoretical insights from ReLIC [Mitrovic et al., 2021], we include additional inductive biases into self-supervised learning. We propose a new self-supervised representation learning method, ReLICv2, which combines an explicit invariance loss with a contrastive objective over a varied set of appropriately constructed data views to avoid learning spurious correlations and obtain more informative representations. ReLICv2 achieves $77.1\%$ top-$1$ accuracy on ImageNet under linear evaluation on a ResNet50, thus improving the previous state-of-the-art by absolute $+1.5\%$; on larger ResNet models, ReLICv2 achieves up to $80.6\%$ outperforming previous self-supervised approaches with margins up to $+2.3\%$. Most notably, ReLICv2 is the first unsupervised representation learning method to consistently outperform the supervised baseline in a like-for-like comparison over a range of ResNet architectures. Using ReLICv2, we also learn more robust and transferable representations that generalize better out-of-distribution than previous work, both on image classification and semantic segmentation. Finally, we show that despite using ResNet encoders, ReLICv2 is comparable to state-of-the-art self-supervised vision transformers.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Semantic Segmentation Cityscapes val ReLICv2 mIoU 75.2 # 61
Semantic Segmentation Cityscapes val BYOL mIoU 74.6 # 64
Self-Supervised Image Classification ImageNet ReLICv2 (ResNet200) Top 1 Accuracy 79.8% # 26
Self-Supervised Image Classification ImageNet ReLICv2 (ResNet152) Top 1 Accuracy 79.3% # 32
Self-Supervised Image Classification ImageNet ReLICv2 (ResNet101) Top 1 Accuracy 78.7% # 38
Self-Supervised Image Classification ImageNet ReLICv2 (ResNet-50) Top 1 Accuracy 77.1% # 48
Self-Supervised Image Classification ImageNet ReLICv2 (ResNet-200 x2) Top 1 Accuracy 80.6% # 21
Self-Supervised Image Classification ImageNet ReLICv2 (ResNet-50 x2) Top 1 Accuracy 79% # 35
Self-Supervised Image Classification ImageNet ReLICv2 (ResNet-50 4x) Top 1 Accuracy 79.4% # 31
Semi-Supervised Image Classification ImageNet - 10% labeled data RELICv2 (ResNet-50) Top 5 Accuracy 91.2% # 17
Top 1 Accuracy 72.4% # 31
Semi-Supervised Image Classification ImageNet - 1% labeled data RELICv2 Top 5 Accuracy 81.3 # 20
Top 1 Accuracy 58.1% # 34
Image Classification ObjectNet SimCLR Top-1 Accuracy 14.6 # 99
Image Classification ObjectNet BYOL Top-1 Accuracy 23 # 85
Image Classification ObjectNet RELIC Top-1 Accuracy 23.8 # 84
Image Classification ObjectNet RELICv2 Top-1 Accuracy 25.9 # 78
Semantic Segmentation PASCAL VOC 2012 val DetCon mIoU 77.3% # 16
Semantic Segmentation PASCAL VOC 2012 val ReLICv2 mIoU 77.9% # 14
Semantic Segmentation PASCAL VOC 2012 val BYOL mIoU 75.7% # 20

Methods