Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision

Discriminative self-supervised learning allows training models on any random group of internet images, and possibly recover salient information that helps differentiate between the images. Applied to ImageNet, this leads to object centric features that perform on par with supervised features on most object-centric downstream tasks. In this work, we question if using this ability, we can learn any salient and more representative information present in diverse unbounded set of images from across the globe. To do so, we train models on billions of random images without any data pre-processing or prior assumptions about what we want the model to learn. We scale our model size to dense 10 billion parameters to avoid underfitting on a large data size. We extensively study and validate our model performance on over 50 benchmarks including fairness, robustness to distribution shift, geographical diversity, fine grained recognition, image copy detection and many image classification datasets. The resulting model, not only captures well semantic information, it also captures information about artistic style and learns salient information such as geolocations and multilingual word embeddings based on visual content only. More importantly, we discover that such model is more robust, more fair, less harmful and less biased than supervised models or models trained on object centric datasets such as ImageNet.

PDF Abstract

Results from the Paper


 Ranked #1 on Copy Detection on Copydays strong subset (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Fine-Grained Image Classification Caltech-101 SEER (RegNet10B - linear eval) Top-1 Error Rate 9.0% # 10
Accuracy 91.0 # 1
Image Classification CIFAR-10 SEER (RegNet10B) Percentage correct 90 # 188
Image Classification CIFAR-100 SEER (RegNet10B) Percentage correct 81.53 # 116
Image Classification CLEVR/Count SEER (RegNetY-128GF) Top 1 Accuracy 87.98 # 2
Image Classification CLEVR/Count SEER (RegNet10B) Top 1 Accuracy 89.28 # 1
Image Classification CLEVR/Dist SEER (RegNet10B) Top 1 Accuracy 74.98 # 1
Image Classification CLEVR/Dist SEER (RegNetY-128GF) Top 1 Accuracy 72.67 # 2
Copy Detection Copydays strong subset SEER (RegNet10B) mAP 90.6 # 1
Image Classification DTD SEER (RegNet10B - linear eval) Accuracy 80.5 # 5
Image Classification EuroSAT SEER (RegNet10B - linear eval) Accuracy (%) 97.5 # 11
Fine-Grained Image Classification FGVC Aircraft SEER (RegNet10B) Accuracy 54.82% # 51
Image Classification Flowers-102 SEER (RegNet10B) Accuracy 96.3 # 41
Image Classification Food-101 SEER (RegNet10B - linear eval) Accuracy (%) 90.3 # 2
Traffic Sign Recognition GTSRB SEER (RegNet10B) Accuracy 90.71% # 4
Meme Classification Hateful Memes SEER (RegNet10B) ROC-AUC 0.734 # 7
Image Classification ImageNet SEER (RG-10B) Top 1 Accuracy 85.8% # 187
Number of params 10000M # 979
Semi-Supervised Image Classification ImageNet - 10% labeled data SEER (RegNet10B) Top 1 Accuracy 78.8% # 10
Semi-Supervised Image Classification ImageNet - 1% labeled data SEER (RegNet10B) Top 1 Accuracy 62.4% # 28
Domain Generalization ImageNet-A SEER (RegNet10B) Top-1 accuracy % 52.7 # 18
Self-Supervised Image Classification ImageNet (finetuned) SEER (Regnet10B) Number of Params 10000M # 1
Top 1 Accuracy 85.8% # 20
Domain Generalization ImageNet-R SEER (RegNet10B) Top-1 Error Rate 43.9 # 19
Image Classification ImageNet ReaL SEER (RegNet10B) Accuracy 89.8% # 22
Params 10000M # 57
Domain Generalization ImageNet-Sketch SEER (RegNet10B) Top-1 accuracy 45.6 # 15
Image Classification ImageNet V2 SEER (RegNet10B) Top 1 Accuracy 76.2 # 17
Out-of-Distribution Generalization ImageNet-W SEER (RegNet-32gf, fine-tuning, IG-1B) IN-W Gap -6.5 # 1
Carton Gap +18 # 1
Image Classification iNaturalist 2018 SEER (RegNet10B - finetuned - 384px) Top-1 Accuracy 84.7% # 8
Action Classification Kinetics-700 SEER (RegNet10B) Top-1 Accuracy 51.9 # 34
Image Classification KITTI-Dist SEER (RegNet10B) Top 1 Accuracy 78.34 # 1
Image Classification MNIST SEER (RegNet10B) Percentage error 0.58 # 45
Accuracy 99.42 # 16
Image Classification ObjectNet SEER (RegNet10B) Top-1 Accuracy 60.2 # 19
Fine-Grained Image Classification Oxford-IIIT Pet Dataset SEER (RegNet10B) Accuracy 85.3% # 16
Image Classification Places205 SEER (RegNet10B - finetuned - 384px) Top 1 Accuracy 69.0 # 3
Image Classification RESISC45 DeiT-B/16 Top 1 Accuracy 92.48 # 10
Image Classification RESISC45 CLIP (ViT-B/16) Top 1 Accuracy 92.7 # 8
Image Classification RESISC45 MoCo-v2 (ResNet50) Top 1 Accuracy 85.4 # 14
Image Classification RESISC45 SEER (RegNet10B) Top 1 Accuracy 95.61 # 2
Image Classification RESISC45 SwAV (ResNet50-w5) Top 1 Accuracy 94.73 # 4
Image Classification RESISC45 DINO (DeiT-B/16) Top 1 Accuracy 93.97 # 5
Image Classification RESISC45 MoCo-v3 (ViT-B/16) Top 1 Accuracy 93.35 # 7
Image Classification RESISC45 SimCLR-v2 (ResNet152-w3 + SK) Top 1 Accuracy 89.77 # 11
Image Classification RESISC45 ResNet50 (ImageNet-supervised) Top 1 Accuracy 88.56 # 12
Fine-Grained Image Classification Stanford Cars SEER (RegNet10B) Accuracy 68.03% # 73
Image Classification STL-10 SEER (RegNet10B) Percentage correct 97.3 # 13
PARAMS 10000M # 118
Fine-Grained Image Classification SUN397 SEER (RegNet10B - linear eval) Accuracy 80.0 # 2
Image Classification SVHN SEER (RegNet10B) Percentage error 13.6 # 47

Methods


No methods listed for this paper. Add relevant methods here