TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	CAE (ViT-L, UperNet)	Validation mIoU	54.7	# 49
Object Detection	COCO minival	CAE (ViT-L, Mask R-CNN, 1x schedule)	box AP	54.5	# 53
Self-Supervised Image Classification	ImageNet (finetuned)	CAE (ViT-L/16)	Number of Params	307M	# 13
Self-Supervised Image Classification	ImageNet (finetuned)	CAE (ViT-L/16)	Top 1 Accuracy	86.3%	# 14

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/context-autoencoder-for-self-supervised/self-supervised-image-classification-on-1)](https://paperswithcode.com/sota/self-supervised-image-classification-on-1?p=context-autoencoder-for-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/context-autoencoder-for-self-supervised/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=context-autoencoder-for-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/context-autoencoder-for-self-supervised/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=context-autoencoder-for-self-supervised)`

Context Autoencoder for Self-Supervised Representation Learning

7 Feb 2022 · Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, Jingdong Wang ·

We present a novel masked image modeling (MIM) approach, context autoencoder (CAE), for self-supervised representation pretraining. We pretrain an encoder by making predictions in the encoded representation space. The pretraining tasks include two tasks: masked representation prediction - predict the representations for the masked patches, and masked patch reconstruction - reconstruct the masked patches. The network is an encoder-regressor-decoder architecture: the encoder takes the visible patches as input; the regressor predicts the representations of the masked patches, which are expected to be aligned with the representations computed from the encoder, using the representations of visible patches and the positions of visible and masked patches; the decoder reconstructs the masked patches from the predicted encoded representations. The CAE design encourages the separation of learning the encoder (representation) from completing the pertaining tasks: masked representation prediction and masked patch reconstruction tasks, and making predictions in the encoded representation space empirically shows the benefit to representation learning. We demonstrate the effectiveness of our CAE through superior transfer performance in downstream tasks: semantic segmentation, object detection and instance segmentation, and classification. The code will be available at https://github.com/Atten4Vis/CAE.

PDF Abstract

Code

Add Remove Mark official

atten4vis/cae official

open-mmlab/mmselfsup

3,077

PaddlePaddle/PaddleFL

490

PaddlePaddle/VIMER

479

lxtGH/CAE

178

See all 6 implementations

Tasks

Add Remove

Instance Segmentation

object-detection

Object Detection

Representation Learning

Self-Supervised Image Classification

Self-Supervised Learning

Semantic Segmentation

Datasets

ImageNet

MS COCO

ADE20K ImageNet-1K

Food-101

Results from the Paper

Edit

Ranked #14 on Self-Supervised Image Classification on ImageNet (finetuned)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	CAE (ViT-L, UperNet)	Validation mIoU	54.7	# 49	Compare
Object Detection	COCO minival	CAE (ViT-L, Mask R-CNN, 1x schedule)	box AP	54.5	# 53	Compare
Self-Supervised Image Classification	ImageNet (finetuned)	CAE (ViT-L/16)	Number of Params	307M	# 13	Compare
Self-Supervised Image Classification	ImageNet (finetuned)	CAE (ViT-L/16)	Top 1 Accuracy	86.3%	# 14	Compare

Methods

Add Remove

AutoEncoder

Edit Social Preview

Context Autoencoder for Self-Supervised Representation Learning

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove