TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Image Classification	ImageNet	ViT-M@224 (cosub)	Top 1 Accuracy	85.0%	# 255
Image Classification	ImageNet	Swin-B@224 (cosub)	Top 1 Accuracy	86.2%	# 164
Image Classification	ImageNet	ViT-B@224 (cosub)	Top 1 Accuracy	86.3%	# 153
Image Classification	ImageNet	ViT-L@224 (cosub)	Top 1 Accuracy	87.5%	# 86
Image Classification	ImageNet	ViT-H@224 (cosub)	Top 1 Accuracy	88.0%	# 69
Image Classification	ImageNet	ViT-S@224 (cosub)	Top 1 Accuracy	83.1%	# 426
Image Classification	ImageNet	RegnetY16GF@224 (cosub)	Top 1 Accuracy	84.2%	# 313
Image Classification	ImageNet	PiT-B@224 (cosub)	Top 1 Accuracy	85.8%	# 187
Image Classification	ImageNet	ConvNeXt-B@224 (cosub)	Top 1 Accuracy	85.8%	# 187
Image Classification	ImageNet	Swin-L@224 (cosub)	Top 1 Accuracy	87.1%	# 103

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/co-training-2-l-submodels-for-visual/image-classification-on-imagenet)](https://paperswithcode.com/sota/image-classification-on-imagenet?p=co-training-2-l-submodels-for-visual)`

Co-training $2^L$ Submodels for Visual Recognition

9 Dec 2022 · Hugo Touvron, Matthieu Cord, Maxime Oquab, Piotr Bojanowski, Jakob Verbeek, Hervé Jégou ·

We introduce submodel co-training, a regularization method related to co-training, self-distillation and stochastic depth. Given a neural network to be trained, for each sample we implicitly instantiate two altered networks, ``submodels'', with stochastic depth: we activate only a subset of the layers. Each network serves as a soft teacher to the other, by providing a loss that complements the regular loss provided by the one-hot label. Our approach, dubbed cosub, uses a single set of weights, and does not involve a pre-trained external model or temporal averaging. Experimentally, we show that submodel co-training is effective to train backbones for recognition tasks such as image classification and semantic segmentation. Our approach is compatible with multiple architectures, including RegNet, ViT, PiT, XCiT, Swin and ConvNext. Our training strategy improves their results in comparable settings. For instance, a ViT-B pretrained with cosub on ImageNet-21k obtains 87.4% top-1 acc. @448 on ImageNet-val.

PDF Abstract

Code

Add Remove Mark official

facebookresearch/deit

3,872

Tasks

Add Remove

Image Classification

Semantic Segmentation

Datasets

ImageNet

Oxford 102 Flower

ADE20K

iNaturalist

Results from the Paper

Edit

Ranked #69 on Image Classification on ImageNet

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Image Classification	ImageNet	ViT-M@224 (cosub)	Top 1 Accuracy	85.0%	# 255	Compare
Image Classification	ImageNet	Swin-B@224 (cosub)	Top 1 Accuracy	86.2%	# 164	Compare
Image Classification	ImageNet	ViT-B@224 (cosub)	Top 1 Accuracy	86.3%	# 153	Compare
Image Classification	ImageNet	ViT-L@224 (cosub)	Top 1 Accuracy	87.5%	# 86	Compare
Image Classification	ImageNet	ViT-H@224 (cosub)	Top 1 Accuracy	88.0%	# 69	Compare
Image Classification	ImageNet	ViT-S@224 (cosub)	Top 1 Accuracy	83.1%	# 426	Compare
Image Classification	ImageNet	RegnetY16GF@224 (cosub)	Top 1 Accuracy	84.2%	# 313	Compare
Image Classification	ImageNet	PiT-B@224 (cosub)	Top 1 Accuracy	85.8%	# 187	Compare
Image Classification	ImageNet	ConvNeXt-B@224 (cosub)	Top 1 Accuracy	85.8%	# 187	Compare
Image Classification	ImageNet	Swin-L@224 (cosub)	Top 1 Accuracy	87.1%	# 103	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Co-training $2^L$ Submodels for Visual Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove