Big Transfer (BiT): General Visual Representation Learning

Transfer of pre-trained representations improves sample efficiency and simplifies hyperparameter tuning when training deep neural networks for vision. We revisit the paradigm of pre-training on large supervised datasets and fine-tuning the model on a target task. We scale up pre-training, and propose a simple recipe that we call Big Transfer (BiT). By combining a few carefully selected components, and transferring using a simple heuristic, we achieve strong performance on over 20 datasets. BiT performs well across a surprisingly wide range of data regimes -- from 1 example per class to 1M total examples. BiT achieves 87.5% top-1 accuracy on ILSVRC-2012, 99.4% on CIFAR-10, and 76.3% on the 19 task Visual Task Adaptation Benchmark (VTAB). On small datasets, BiT attains 76.8% on ILSVRC-2012 with 10 examples per class, and 97.0% on CIFAR-10 with 10 examples per class. We conduct detailed analysis of the main components that lead to high transfer performance.

PDF Abstract ECCV 2020 PDF ECCV 2020 Abstract

Results from the Paper


 Ranked #1 on Out-of-Distribution Generalization on ImageNet-W (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Image Classification CIFAR-10 BiT-M (ResNet) Percentage correct 98.91 # 26
Image Classification CIFAR-10 BiT-L (ResNet) Percentage correct 99.37 # 7
Image Classification CIFAR-100 BiT-L (ResNet) Percentage correct 93.51 # 8
Image Classification CIFAR-100 BiT-M (ResNet) Percentage correct 92.17 # 15
Image Classification Flowers-102 BiT-L (ResNet) Accuracy 99.63 # 6
Image Classification Flowers-102 BiT-M (ResNet) Accuracy 99.30 # 10
Image Classification ImageNet BiT-L (ResNet) Top 1 Accuracy 87.54% # 89
Image Classification ImageNet BiT-M (ResNet) Top 1 Accuracy 85.39% # 246
Number of params 928M # 1004
Image Classification ImageNet ReaL BiT-M Accuracy 89.02% # 25
Image Classification ImageNet ReaL BiT-L Accuracy 90.54% # 17
Params 928M # 54
Out-of-Distribution Generalization ImageNet-W BiT-M (ResNet-50v2, IN-21k) IN-W Gap -8.6 # 1
Carton Gap +28 # 1
Image Classification ObjectNet BiT-L (ResNet-152x4) Top-5 Accuracy 80 # 2
Top-1 Accuracy 58.7 # 21
Image Classification ObjectNet BiT-S (ResNet-152x4) Top-5 Accuracy 57 # 13
Top-1 Accuracy 36.0 # 50
Image Classification ObjectNet BiT-M (ResNet-152x4) Top-5 Accuracy 69 # 5
Top-1 Accuracy 47.0 # 33
Image Classification ObjectNet (Bounding Box) BiT-S (ResNet) Top 5 Accuracy 64.4 # 3
Image Classification ObjectNet (Bounding Box) BiT-L (ResNet) Top 5 Accuracy 85.1 # 1
Image Classification ObjectNet (Bounding Box) BiT-M (ResNet) Top 5 Accuracy 76.0 # 2
Image Classification OmniBenchmark BiT-M Average Top-1 Accuracy 40.4 # 6
Fine-Grained Image Classification Oxford 102 Flowers BiT-M (ResNet) Top-1 Error Rate 0.70 # 2
Accuracy 99.30% # 4
Fine-Grained Image Classification Oxford 102 Flowers BiT-L (ResNet) Top-1 Error Rate 0.37 # 1
Accuracy 99.63% # 2
Fine-Grained Image Classification Oxford-IIIT Pets BiT-L (ResNet) Accuracy 96.62 # 2
Top-1 Error Rate 3.38% # 2
Fine-Grained Image Classification Oxford-IIIT Pets BiT-M (ResNet) Accuracy 94.47 # 5
Top-1 Error Rate 5.53% # 3
Image Classification VTAB-1k BiT-S Top-1 Accuracy 66.9 # 15
Image Classification VTAB-1k BiT-M Top-1 Accuracy 70.6 # 11
Image Classification VTAB-1k BiT-L Top-1 Accuracy 76.3 # 6
Image Classification VTAB-1k BiT-L (50 hypers/task) Top-1 Accuracy 78.72 # 2

Methods