TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	iBOT (ViT-B/16) (linear head)	Validation mIoU	38.3	# 214
Semantic Segmentation	ADE20K	iBOT (ViT-S/16)	Validation mIoU	45.4	# 183
Semantic Segmentation	ADE20K	iBOT (ViT-B/16)	Validation mIoU	50.0	# 114
Instance Segmentation	COCO test-dev	iBOT (ViT-B/16)	mask AP	44.2	# 43
Instance Segmentation	COCO test-dev	iBOT (ViT-S/16)	mask AP	42.6	# 51
Object Detection	COCO test-dev	iBOT (ViT-S/16)	box mAP	49.4	# 90
Object Detection	COCO test-dev	iBOT (ViT-B/16)	box mAP	51.2	# 78
Unsupervised Image Classification	ImageNet	iBOT (ViT-S/16)	Accuracy (%)	43.4	# 2
Unsupervised Image Classification	ImageNet	iBOT (ViT-S/16)	ARI	32.8	# 1
Self-Supervised Image Classification	ImageNet	iBOT (ViT-L/16) (IN22k)	Top 1 Accuracy	82.3%	# 11
Self-Supervised Image Classification	ImageNet	iBOT (ViT-L/16) (IN22k)	Number of Params	307M	# 16
Self-Supervised Image Classification	ImageNet	iBOT (ViT-L/16)	Top 1 Accuracy	81.3%	# 16
Self-Supervised Image Classification	ImageNet	iBOT (ViT-L/16)	Number of Params	307M	# 16
Semi-Supervised Image Classification	ImageNet - 1% labeled data	iBOT (ViT-S/16)	Top 1 Accuracy	61.9%	# 30
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT(ViT-L/16, 512)	Number of Params	307M	# 13
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT(ViT-L/16, 512)	Top 1 Accuracy	87.8%	# 7
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT(ViT-L/16)	Number of Params	307M	# 13
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT(ViT-L/16)	Top 1 Accuracy	86.6%	# 12
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT (ViT-L/16)	Number of Params	307M	# 13
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT (ViT-L/16)	Top 1 Accuracy	84.8%	# 26
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT (ViT-B/16)	Number of Params	85M	# 39
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT (ViT-B/16)	Top 1 Accuracy	84.4%	# 31
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT (ViT-B/16)	Number of Params	85M	# 39
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT (ViT-B/16)	Top 1 Accuracy	84.0%	# 38

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ibot-image-bert-pre-training-with-online/unsupervised-image-classification-on-imagenet)](https://paperswithcode.com/sota/unsupervised-image-classification-on-imagenet?p=ibot-image-bert-pre-training-with-online)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ibot-image-bert-pre-training-with-online/self-supervised-image-classification-on-1)](https://paperswithcode.com/sota/self-supervised-image-classification-on-1?p=ibot-image-bert-pre-training-with-online)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ibot-image-bert-pre-training-with-online/self-supervised-image-classification-on)](https://paperswithcode.com/sota/self-supervised-image-classification-on?p=ibot-image-bert-pre-training-with-online)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ibot-image-bert-pre-training-with-online/semi-supervised-image-classification-on-1)](https://paperswithcode.com/sota/semi-supervised-image-classification-on-1?p=ibot-image-bert-pre-training-with-online)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ibot-image-bert-pre-training-with-online/instance-segmentation-on-coco)](https://paperswithcode.com/sota/instance-segmentation-on-coco?p=ibot-image-bert-pre-training-with-online)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ibot-image-bert-pre-training-with-online/object-detection-on-coco)](https://paperswithcode.com/sota/object-detection-on-coco?p=ibot-image-bert-pre-training-with-online)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ibot-image-bert-pre-training-with-online/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=ibot-image-bert-pre-training-with-online)`

iBOT: Image BERT Pre-Training with Online Tokenizer

15 Nov 2021 · Jinghao Zhou, Chen Wei, Huiyu Wang, Wei Shen, Cihang Xie, Alan Yuille, Tao Kong ·

The success of language Transformers is primarily attributed to the pretext task of masked language modeling (MLM), where texts are first tokenized into semantically meaningful pieces. In this work, we study masked image modeling (MIM) and indicate the advantages and challenges of using a semantically meaningful visual tokenizer. We present a self-supervised framework iBOT that can perform masked prediction with an online tokenizer. Specifically, we perform self-distillation on masked patch tokens and take the teacher network as the online tokenizer, along with self-distillation on the class token to acquire visual semantics. The online tokenizer is jointly learnable with the MIM objective and dispenses with a multi-stage training pipeline where the tokenizer needs to be pre-trained beforehand. We show the prominence of iBOT by achieving an 82.3% linear probing accuracy and an 87.8% fine-tuning accuracy evaluated on ImageNet-1K. Beyond the state-of-the-art image classification results, we underline emerging local semantic patterns, which helps the models to obtain strong robustness against common corruptions and achieve leading results on dense downstream tasks, eg., object detection, instance segmentation, and semantic segmentation.

PDF Abstract

Code

Add Remove Mark official

bytedance/ibot official

↳ Quickstart in

Colab

620

Tasks

Add Remove

Image Classification

Instance Segmentation

Language Modelling

Masked Language Modeling

Object Detection

Self-Supervised Image Classification

Semantic Segmentation

Semi-Supervised Image Classification

Unsupervised Image Classification

Datasets

ImageNet

MS COCO

ADE20K

ImageNet-C

ImageNet-A

Results from the Paper

Add Remove

Ranked #1 on Unsupervised Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	iBOT (ViT-B/16) (linear head)	Validation mIoU	38.3	# 214	Compare
Semantic Segmentation	ADE20K	iBOT (ViT-S/16)	Validation mIoU	45.4	# 183	Compare
Semantic Segmentation	ADE20K	iBOT (ViT-B/16)	Validation mIoU	50.0	# 114	Compare
Instance Segmentation	COCO test-dev	iBOT (ViT-B/16)	mask AP	44.2	# 43	Compare
Instance Segmentation	COCO test-dev	iBOT (ViT-S/16)	mask AP	42.6	# 51	Compare
Object Detection	COCO test-dev	iBOT (ViT-S/16)	box mAP	49.4	# 90	Compare
Object Detection	COCO test-dev	iBOT (ViT-B/16)	box mAP	51.2	# 78	Compare
Unsupervised Image Classification	ImageNet	iBOT (ViT-S/16)	Accuracy (%)	43.4	# 2	Compare
Unsupervised Image Classification	ImageNet	iBOT (ViT-S/16)	ARI	32.8	# 1	Compare
Self-Supervised Image Classification	ImageNet	iBOT (ViT-L/16) (IN22k)	Top 1 Accuracy	82.3%	# 11	Compare
Self-Supervised Image Classification	ImageNet	iBOT (ViT-L/16) (IN22k)	Number of Params	307M	# 16	Compare
Self-Supervised Image Classification	ImageNet	iBOT (ViT-L/16)	Top 1 Accuracy	81.3%	# 16	Compare
Self-Supervised Image Classification	ImageNet	iBOT (ViT-L/16)	Number of Params	307M	# 16	Compare
Semi-Supervised Image Classification	ImageNet - 1% labeled data	iBOT (ViT-S/16)	Top 1 Accuracy	61.9%	# 30	Compare
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT(ViT-L/16, 512)	Number of Params	307M	# 13	Compare
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT(ViT-L/16, 512)	Top 1 Accuracy	87.8%	# 7	Compare
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT(ViT-L/16)	Number of Params	307M	# 13	Compare
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT(ViT-L/16)	Top 1 Accuracy	86.6%	# 12	Compare
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT (ViT-L/16)	Number of Params	307M	# 13	Compare
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT (ViT-L/16)	Top 1 Accuracy	84.8%	# 26	Compare
Self-Supervised Image Classification	ImageNet (finetuned)	iBOT (ViT-B/16)	Number of Params	85M	# 39	Compare
			Top 1 Accuracy	84.4%	# 31	Compare
			Number of Params	85M	# 39	Compare
			Top 1 Accuracy	84.0%	# 38	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

iBOT: Image BERT Pre-Training with Online Tokenizer

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove