TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-Shot Transfer Image Classification	Food-101	MAWS (ViT-2B)	Top 1 Accuracy	96.2	# 1
Image Classification	ImageNet	MAWS (ViT-6.5B)	Top 1 Accuracy	90.1%	# 16
Image Classification	ImageNet	MAWS (ViT-6.5B)	Number of params	6500M	# 977
Image Classification	ImageNet	MAWS (ViT-B)	Top 1 Accuracy	86.8%	# 121
Image Classification	ImageNet	MAWS (ViT-L)	Top 1 Accuracy	88.8%	# 38
Zero-Shot Transfer Image Classification	ImageNet	MAWS (ViT-2B)	Accuracy (Private)	82.1	# 12
Image Classification	ImageNet	MAWS (ViT-2B)	Top 1 Accuracy	89.8%	# 21
Image Classification	ImageNet	MAWS (ViT-2B)	Number of params	2000M	# 965
Image Classification	ImageNet	MAWS (ViT-H)	Top 1 Accuracy	89.5%	# 28
Image Classification	ImageNet	MAWS (ViT-H)	Number of params	650M	# 945
Zero-Shot Transfer Image Classification	ImageNet	MAWS (ViT-H)	Accuracy (Private)	81.1	# 15
Few-Shot Image Classification	ImageNet - 10-shot	MAWS (ViT-H)	Top 1 Accuracy	82.5	# 4
Few-Shot Image Classification	ImageNet - 10-shot	MAWS (ViT-2B)	Top 1 Accuracy	83.7	# 3
Few-Shot Image Classification	ImageNet - 10-shot	MAWS (ViT-6.5B)	Top 1 Accuracy	84.6	# 1
Few-Shot Image Classification	ImageNet - 1-shot	MAWS (ViT-H)	Top 1 Accuracy	57.1	# 8
Few-Shot Image Classification	ImageNet - 1-shot	MAWS (ViT-6.5B)	Top 1 Accuracy	63.6	# 2
Few-Shot Image Classification	ImageNet - 1-shot	MAWS (ViT-2B)	Top 1 Accuracy	62.1	# 7
Few-Shot Image Classification	ImageNet - 5-shot	MAWS (ViT-6.5B)	Top 1 Accuracy	82.6	# 2
Few-Shot Image Classification	ImageNet - 5-shot	MAWS (ViT-H)	Top 1 Accuracy	79.8	# 4
Few-Shot Image Classification	ImageNet - 5-shot	MAWS (ViT-2B)	Top 1 Accuracy	81.5	# 3
Image Classification	ImageNet ReaL	MAWS (ViT-H)	Accuracy	90.8%	# 12
Image Classification	ImageNet ReaL	MAWS (ViT-6.5B)	Accuracy	91.1%	# 5
Image Classification	ImageNet ReaL	MAWS (ViT-2B)	Accuracy	90.9%	# 9
Image Classification	ImageNet V2	MAWS (ViT-6.5B)	Top 1 Accuracy	84.0	# 4
Image Classification	ImageNet V2	MAWS (ViT-2B)	Top 1 Accuracy	83.0	# 7
Image Classification	iNaturalist 2018	MAWS (ViT-2B)	Top-1 Accuracy	91.3%	# 3
Few-Shot Image Classification	iNaturalist 2018 - 10-shot	MAWS (ViT-2B)	Top 1 Accuracy	80.3	# 1
Few-Shot Image Classification	iNaturalist 2018 - 1-shot	MAWS (ViT-2B)	Top 1 Accuracy	35.5	# 1
Few-Shot Image Classification	iNaturalist 2018 - 5-shot	MAWS (ViT-2B)	Top 1 Accuracy	72.8	# 1
Image Classification	ObjectNet	MAWS (ViT-H)	Top-1 Accuracy	72.6	# 9
Image Classification	ObjectNet	MAWS (ViT-2B)	Top-1 Accuracy	75.8	# 8
Image Classification	ObjectNet	MAWS (ViT-6.5B)	Top-1 Accuracy	77.9	# 7
Action Recognition	Something-Something V2	MAWS (ViT-L)	Top-1 Accuracy	74.4	# 14

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/zero-shot-transfer-image-classification-on-17)](https://paperswithcode.com/sota/zero-shot-transfer-image-classification-on-17?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/few-shot-image-classification-on-imagenet-10)](https://paperswithcode.com/sota/few-shot-image-classification-on-imagenet-10?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/few-shot-image-classification-on-inaturalist-3)](https://paperswithcode.com/sota/few-shot-image-classification-on-inaturalist-3?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/few-shot-image-classification-on-inaturalist-1)](https://paperswithcode.com/sota/few-shot-image-classification-on-inaturalist-1?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/few-shot-image-classification-on-inaturalist-2)](https://paperswithcode.com/sota/few-shot-image-classification-on-inaturalist-2?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/few-shot-image-classification-on-imagenet-1-1)](https://paperswithcode.com/sota/few-shot-image-classification-on-imagenet-1-1?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/few-shot-image-classification-on-imagenet-5)](https://paperswithcode.com/sota/few-shot-image-classification-on-imagenet-5?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/image-classification-on-inaturalist-2018)](https://paperswithcode.com/sota/image-classification-on-inaturalist-2018?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/image-classification-on-imagenet-v2)](https://paperswithcode.com/sota/image-classification-on-imagenet-v2?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/image-classification-on-imagenet-real)](https://paperswithcode.com/sota/image-classification-on-imagenet-real?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/image-classification-on-objectnet)](https://paperswithcode.com/sota/image-classification-on-objectnet?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/zero-shot-transfer-image-classification-on-1)](https://paperswithcode.com/sota/zero-shot-transfer-image-classification-on-1?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/action-recognition-in-videos-on-something)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something?p=the-effectiveness-of-mae-pre-pretraining-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/the-effectiveness-of-mae-pre-pretraining-for/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=the-effectiveness-of-mae-pre-pretraining-for)`

The effectiveness of MAE pre-pretraining for billion-scale pretraining

ICCV 2023 · Mannat Singh, Quentin Duval, Kalyan Vasudev Alwala, Haoqi Fan, Vaibhav Aggarwal, Aaron Adcock, Armand Joulin, Piotr Dollár, Christoph Feichtenhofer, Ross Girshick, Rohit Girdhar, Ishan Misra ·

This paper revisits the standard pretrain-then-finetune paradigm used in computer vision for visual recognition tasks. Typically, state-of-the-art foundation models are pretrained using large scale (weakly) supervised datasets with billions of images. We introduce an additional pre-pretraining stage that is simple and uses the self-supervised MAE technique to initialize the model. While MAE has only been shown to scale with the size of models, we find that it scales with the size of the training dataset as well. Thus, our MAE-based pre-pretraining scales with both model and data size making it applicable for training foundation models. Pre-pretraining consistently improves both the model convergence and the downstream transfer performance across a range of model scales (millions to billions of parameters), and dataset sizes (millions to billions of images). We measure the effectiveness of pre-pretraining on 10 different visual recognition tasks spanning image classification, video recognition, object detection, low-shot classification and zero-shot recognition. Our largest model achieves new state-of-the-art results on iNaturalist-18 (91.7%), ImageNet-ReaL (91.1%), 1-shot ImageNet-1k (63.6%), and zero-shot transfer on Food-101 (96.2%). Our study reveals that model initialization plays a significant role, even for web-scale pretraining with billions of images, and our models are available publicly.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

facebookresearch/maws official

↳ Quickstart in

Colab

Tasks

Add Remove

Action Classification

Action Recognition

Few-Shot Image Classification

Image Classification

Object Detection

Video Classification

Video Recognition

Zero-Shot Learning

Zero-Shot Transfer Image Classification

Datasets

ImageNet

MS COCO

Kinetics ImageNet-1K

Kinetics 400

Food-101

iNaturalist

LVIS

Something-Something V2

ObjectNet

Results from the Paper

Add Remove

Ranked #1 on Few-Shot Image Classification on ImageNet - 10-shot (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-Shot Transfer Image Classification	Food-101	MAWS (ViT-2B)	Top 1 Accuracy	96.2	# 1	Compare
Image Classification	ImageNet	MAWS (ViT-6.5B)	Top 1 Accuracy	90.1%	# 16	Compare
Image Classification	ImageNet	MAWS (ViT-6.5B)	Number of params	6500M	# 977	Compare
Image Classification	ImageNet	MAWS (ViT-B)	Top 1 Accuracy	86.8%	# 121	Compare
Image Classification	ImageNet	MAWS (ViT-L)	Top 1 Accuracy	88.8%	# 38	Compare
Zero-Shot Transfer Image Classification	ImageNet	MAWS (ViT-2B)	Accuracy (Private)	82.1	# 12	Compare
Image Classification	ImageNet	MAWS (ViT-2B)	Top 1 Accuracy	89.8%	# 21	Compare
Image Classification	ImageNet	MAWS (ViT-2B)	Number of params	2000M	# 965	Compare
Image Classification	ImageNet	MAWS (ViT-H)	Top 1 Accuracy	89.5%	# 28	Compare
Image Classification	ImageNet	MAWS (ViT-H)	Number of params	650M	# 945	Compare
Zero-Shot Transfer Image Classification	ImageNet	MAWS (ViT-H)	Accuracy (Private)	81.1	# 15	Compare
Few-Shot Image Classification	ImageNet - 10-shot	MAWS (ViT-H)	Top 1 Accuracy	82.5	# 4	Compare
Few-Shot Image Classification	ImageNet - 10-shot	MAWS (ViT-2B)	Top 1 Accuracy	83.7	# 3	Compare
Few-Shot Image Classification	ImageNet - 10-shot	MAWS (ViT-6.5B)	Top 1 Accuracy	84.6	# 1	Compare
Few-Shot Image Classification	ImageNet - 1-shot	MAWS (ViT-H)	Top 1 Accuracy	57.1	# 8	Compare
Few-Shot Image Classification	ImageNet - 1-shot	MAWS (ViT-6.5B)	Top 1 Accuracy	63.6	# 2	Compare
Few-Shot Image Classification	ImageNet - 1-shot	MAWS (ViT-2B)	Top 1 Accuracy	62.1	# 7	Compare
Few-Shot Image Classification	ImageNet - 5-shot	MAWS (ViT-6.5B)	Top 1 Accuracy	82.6	# 2	Compare
Few-Shot Image Classification	ImageNet - 5-shot	MAWS (ViT-H)	Top 1 Accuracy	79.8	# 4	Compare
Few-Shot Image Classification	ImageNet - 5-shot	MAWS (ViT-2B)	Top 1 Accuracy	81.5	# 3	Compare
Image Classification	ImageNet ReaL	MAWS (ViT-H)	Accuracy	90.8%	# 12	Compare
Image Classification	ImageNet ReaL	MAWS (ViT-6.5B)	Accuracy	91.1%	# 5	Compare
Image Classification	ImageNet ReaL	MAWS (ViT-2B)	Accuracy	90.9%	# 9	Compare
Image Classification	ImageNet V2	MAWS (ViT-6.5B)	Top 1 Accuracy	84.0	# 4	Compare
Image Classification	ImageNet V2	MAWS (ViT-2B)	Top 1 Accuracy	83.0	# 7	Compare
Image Classification	iNaturalist 2018	MAWS (ViT-2B)	Top-1 Accuracy	91.3%	# 3	Compare
Few-Shot Image Classification	iNaturalist 2018 - 10-shot	MAWS (ViT-2B)	Top 1 Accuracy	80.3	# 1	Compare
Few-Shot Image Classification	iNaturalist 2018 - 1-shot	MAWS (ViT-2B)	Top 1 Accuracy	35.5	# 1	Compare
Few-Shot Image Classification	iNaturalist 2018 - 5-shot	MAWS (ViT-2B)	Top 1 Accuracy	72.8	# 1	Compare
Image Classification	ObjectNet	MAWS (ViT-H)	Top-1 Accuracy	72.6	# 9	Compare
Image Classification	ObjectNet	MAWS (ViT-2B)	Top-1 Accuracy	75.8	# 8	Compare
Image Classification	ObjectNet	MAWS (ViT-6.5B)	Top-1 Accuracy	77.9	# 7	Compare
Action Recognition	Something-Something V2	MAWS (ViT-L)	Top-1 Accuracy	74.4	# 14	Compare

Methods

Add Remove

MAE

Edit Social Preview

The effectiveness of MAE pre-pretraining for billion-scale pretraining

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove