TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semantic Segmentation	ADE20K	Frozen Backbone, SwinV2-G-ext22K (Mask2Former)	Validation mIoU	57.6	# 24
Instance Segmentation	COCO minival	Frozen Backbone, SwinV2-G-ext22K (HTC)	mask AP	51.6	# 17
Object Detection	COCO minival	Frozen Backbone, SwinV2-G-ext22K (HTC)	box AP	59.3	# 27
Action Recognition In Videos	Kinetics-400	Frozen Backbone, SwinV2-G-ext22K (Video-Swin)	Top-1 Accuracy	81.7	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/could-giant-pretrained-image-models-extract/action-recognition-in-videos-on-kinetics-400-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-kinetics-400-1?p=could-giant-pretrained-image-models-extract)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/could-giant-pretrained-image-models-extract/instance-segmentation-on-coco-minival)](https://paperswithcode.com/sota/instance-segmentation-on-coco-minival?p=could-giant-pretrained-image-models-extract)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/could-giant-pretrained-image-models-extract/semantic-segmentation-on-ade20k)](https://paperswithcode.com/sota/semantic-segmentation-on-ade20k?p=could-giant-pretrained-image-models-extract)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/could-giant-pretrained-image-models-extract/object-detection-on-coco-minival)](https://paperswithcode.com/sota/object-detection-on-coco-minival?p=could-giant-pretrained-image-models-extract)`

Could Giant Pretrained Image Models Extract Universal Representations?

3 Nov 2022 · Yutong Lin, Ze Liu, Zheng Zhang, Han Hu, Nanning Zheng, Stephen Lin, Yue Cao ·

Frozen pretrained models have become a viable alternative to the pretraining-then-finetuning paradigm for transfer learning. However, with frozen models there are relatively few parameters available for adapting to downstream tasks, which is problematic in computer vision where tasks vary significantly in input/output format and the type of information that is of value. In this paper, we present a study of frozen pretrained models when applied to diverse and representative computer vision tasks, including object detection, semantic segmentation and video action recognition. From this empirical analysis, our work answers the questions of what pretraining task fits best with this frozen setting, how to make the frozen setting more flexible to various downstream tasks, and the effect of larger model sizes. We additionally examine the upper bound of performance using a giant frozen pretrained model with 3 billion parameters (SwinV2-G) and find that it reaches competitive performance on a varied set of major benchmarks with only one shared frozen base network: 60.0 box mAP and 52.2 mask mAP on COCO object detection test-dev, 57.6 val mIoU on ADE20K semantic segmentation, and 81.7 top-1 accuracy on Kinetics-400 action recognition. With this work, we hope to bring greater attention to this promising path of freezing pretrained image models.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Recognition

Action Recognition In Videos

Instance Segmentation

object-detection

Object Detection

Semantic Segmentation

Temporal Action Localization

Transfer Learning

Datasets

ImageNet

MS COCO

Kinetics

ADE20K

Kinetics 400

Results from the Paper

Edit

Ranked #3 on Action Recognition In Videos on Kinetics-400

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semantic Segmentation	ADE20K	Frozen Backbone, SwinV2-G-ext22K (Mask2Former)	Validation mIoU	57.6	# 24	Compare
Instance Segmentation	COCO minival	Frozen Backbone, SwinV2-G-ext22K (HTC)	mask AP	51.6	# 17	Compare
Object Detection	COCO minival	Frozen Backbone, SwinV2-G-ext22K (HTC)	box AP	59.3	# 27	Compare
Action Recognition In Videos	Kinetics-400	Frozen Backbone, SwinV2-G-ext22K (Video-Swin)	Top-1 Accuracy	81.7	# 3	Compare

Methods

Add Remove

BASE

Edit Social Preview

Could Giant Pretrained Image Models Extract Universal Representations?

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove