TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Instance Segmentation	COCO minival	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	mask AP	48.8	# 30
Open Vocabulary Object Detection	LVIS v1.0	X-Paste	AP novel-LVIS base training	21.4	# 16
Open Vocabulary Object Detection	LVIS v1.0	X-Paste	AP novel-Unrestricted open-vocabulary training	22.8	# 4
Instance Segmentation	LVIS v1.0 val	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	mask AP	45.4	# 7
Instance Segmentation	LVIS v1.0 val	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	mask APr	43.8	# 1
Object Detection	LVIS v1.0 val	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	box AP	50.9	# 8
Object Detection	LVIS v1.0 val	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	box APr	48.7	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/x-paste-revisit-copy-paste-at-scale-with-clip/instance-segmentation-on-lvis-v1-0-val)](https://paperswithcode.com/sota/instance-segmentation-on-lvis-v1-0-val?p=x-paste-revisit-copy-paste-at-scale-with-clip)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/x-paste-revisit-copy-paste-at-scale-with-clip/object-detection-on-lvis-v1-0-val)](https://paperswithcode.com/sota/object-detection-on-lvis-v1-0-val?p=x-paste-revisit-copy-paste-at-scale-with-clip)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/x-paste-revisit-copy-paste-at-scale-with-clip/open-vocabulary-object-detection-on-lvis-v1-0)](https://paperswithcode.com/sota/open-vocabulary-object-detection-on-lvis-v1-0?p=x-paste-revisit-copy-paste-at-scale-with-clip)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/x-paste-revisit-copy-paste-at-scale-with-clip/instance-segmentation-on-coco-minival)](https://paperswithcode.com/sota/instance-segmentation-on-coco-minival?p=x-paste-revisit-copy-paste-at-scale-with-clip)`

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

7 Dec 2022 · Hanqing Zhao, Dianmo Sheng, Jianmin Bao, Dongdong Chen, Dong Chen, Fang Wen, Lu Yuan, Ce Liu, Wenbo Zhou, Qi Chu, Weiming Zhang, Nenghai Yu ·

Copy-Paste is a simple and effective data augmentation strategy for instance segmentation. By randomly pasting object instances onto new background images, it creates new training data for free and significantly boosts the segmentation performance, especially for rare object categories. Although diverse, high-quality object instances used in Copy-Paste result in more performance gain, previous works utilize object instances either from human-annotated instance segmentation datasets or rendered from 3D object models, and both approaches are too expensive to scale up to obtain good diversity. In this paper, we revisit Copy-Paste at scale with the power of newly emerged zero-shot recognition models (e.g., CLIP) and text2image models (e.g., StableDiffusion). We demonstrate for the first time that using a text2image model to generate images or zero-shot recognition model to filter noisily crawled images for different object categories is a feasible way to make Copy-Paste truly scalable. To make such success happen, we design a data acquisition and processing framework, dubbed ``X-Paste", upon which a systematic study is conducted. On the LVIS dataset, X-Paste provides impressive improvements over the strong baseline CenterNet2 with Swin-L as the backbone. Specifically, it archives +2.6 box AP and +2.1 mask AP gains on all classes and even more significant gains with +6.8 box AP, +6.5 mask AP on long-tail classes. Our code and models are available at https://github.com/yoctta/XPaste.

PDF Abstract

Code

Add Remove Mark official

yoctta/xpaste official

Tasks

Add Remove

Data Augmentation

Instance Segmentation

Object

Object Detection

Open Vocabulary Object Detection

Segmentation

Semantic Segmentation

Zero-Shot Learning

Datasets

ImageNet

MS COCO

LVIS

Results from the Paper

Edit

Ranked #7 on Instance Segmentation on LVIS v1.0 val

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Instance Segmentation	COCO minival	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	mask AP	48.8	# 30	Compare
Open Vocabulary Object Detection	LVIS v1.0	X-Paste	AP novel-LVIS base training	21.4	# 16	Compare
Open Vocabulary Object Detection	LVIS v1.0	X-Paste	AP novel-Unrestricted open-vocabulary training	22.8	# 4	Compare
Instance Segmentation	LVIS v1.0 val	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	mask AP	45.4	# 7	Compare
Instance Segmentation	LVIS v1.0 val	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	mask APr	43.8	# 1	Compare
Object Detection	LVIS v1.0 val	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	box AP	50.9	# 8	Compare
Object Detection	LVIS v1.0 val	CenterNet2 (Swin-L w/ X-Paste + Copy-Paste)	box APr	48.7	# 3	Compare

Methods

Add Remove

Copy-Paste

Edit Social Preview

X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove