TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Unsupervised Semantic Segmentation with Language-image Pre-training	ADE20K	TCL	Mean IoU (val)	17.1	# 2
Semantic Segmentation	CC3M-TagMask	TCL	mIoU	60.4	# 2
Unsupervised Semantic Segmentation with Language-image Pre-training	Cityscapes val	TCL	mIoU	24.0	# 5
Unsupervised Semantic Segmentation with Language-image Pre-training	COCO-Object	TCL	mIoU	31.6	# 4
Unsupervised Semantic Segmentation with Language-image Pre-training	COCO-Stuff-171	TCL	mIoU	22.4	# 3
Open Vocabulary Semantic Segmentation	PASCAL Context-59	TCL	mIoU	33.9	# 16
Unsupervised Semantic Segmentation with Language-image Pre-training	PASCAL Context-59	TCL	mIoU	33.9	# 3
Unsupervised Semantic Segmentation with Language-image Pre-training	PASCAL VOC	TCL	mIoU	55.0	# 3
Unsupervised Semantic Segmentation with Language-image Pre-training	PascalVOC-20	TCL	mIoU	83.2	# 2
Open Vocabulary Semantic Segmentation	PascalVOC-20	TCL	mIoU	83.2	# 12

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-generate-text-grounded-mask-for/unsupervised-semantic-segmentation-with-4)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-with-4?p=learning-to-generate-text-grounded-mask-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-generate-text-grounded-mask-for/semantic-segmentation-on-cc3m-tagmask)](https://paperswithcode.com/sota/semantic-segmentation-on-cc3m-tagmask?p=learning-to-generate-text-grounded-mask-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-generate-text-grounded-mask-for/unsupervised-semantic-segmentation-with-7)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-with-7?p=learning-to-generate-text-grounded-mask-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-generate-text-grounded-mask-for/unsupervised-semantic-segmentation-with-9)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-with-9?p=learning-to-generate-text-grounded-mask-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-generate-text-grounded-mask-for/unsupervised-semantic-segmentation-with-8)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-with-8?p=learning-to-generate-text-grounded-mask-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-generate-text-grounded-mask-for/unsupervised-semantic-segmentation-with-11)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-with-11?p=learning-to-generate-text-grounded-mask-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-generate-text-grounded-mask-for/unsupervised-semantic-segmentation-with-10)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-with-10?p=learning-to-generate-text-grounded-mask-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-generate-text-grounded-mask-for/unsupervised-semantic-segmentation-with-3)](https://paperswithcode.com/sota/unsupervised-semantic-segmentation-with-3?p=learning-to-generate-text-grounded-mask-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-generate-text-grounded-mask-for/open-vocabulary-semantic-segmentation-on-5)](https://paperswithcode.com/sota/open-vocabulary-semantic-segmentation-on-5?p=learning-to-generate-text-grounded-mask-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-to-generate-text-grounded-mask-for/open-vocabulary-semantic-segmentation-on-1)](https://paperswithcode.com/sota/open-vocabulary-semantic-segmentation-on-1?p=learning-to-generate-text-grounded-mask-for)`

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

CVPR 2023 · Junbum Cha, Jonghwan Mun, Byungseok Roh ·

We tackle open-world semantic segmentation, which aims at learning to segment arbitrary visual concepts in images, by using only image-text pairs without dense annotations. Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts and transferring the learned image-level understanding to the segmentation task. However, these CL-based methods suffer from a train-test discrepancy, since it only considers image-text alignment during training, whereas segmentation requires region-text alignment during testing. In this paper, we proposed a novel Text-grounded Contrastive Learning (TCL) framework that enables a model to directly learn region-text alignment. Our method generates a segmentation mask for a given text, extracts text-grounded image embedding from the masked region, and aligns it with text embedding via TCL. By learning region-text alignment directly, our framework encourages a model to directly improve the quality of generated segmentation masks. In addition, for a rigorous and fair comparison, we present a unified evaluation protocol with widely used 8 semantic segmentation datasets. TCL achieves state-of-the-art zero-shot segmentation performances with large margins in all datasets. Code is available at https://github.com/kakaobrain/tcl.

PDF Abstract CVPR 2023 PDF CVPR 2023 Abstract

Code

Add Remove Mark official

kakaobrain/tcl official

↳ Quickstart in

Spaces

Tasks

Add Remove

Contrastive Learning

Open Vocabulary Semantic Segmentation

Segmentation

Semantic Segmentation

Unsupervised Semantic Segmentation with Language-image Pre-training

Zero Shot Segmentation

Datasets

Cityscapes

ADE20K

PASCAL Context

COCO-Stuff

PASCAL VOC CC3M-TagMask

Results from the Paper

Edit

Ranked #2 on Semantic Segmentation on CC3M-TagMask

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Unsupervised Semantic Segmentation with Language-image Pre-training	ADE20K	TCL	Mean IoU (val)	17.1	# 2	Compare
Semantic Segmentation	CC3M-TagMask	TCL	mIoU	60.4	# 2	Compare
Unsupervised Semantic Segmentation with Language-image Pre-training	Cityscapes val	TCL	mIoU	24.0	# 5	Compare
Unsupervised Semantic Segmentation with Language-image Pre-training	COCO-Object	TCL	mIoU	31.6	# 4	Compare
Unsupervised Semantic Segmentation with Language-image Pre-training	COCO-Stuff-171	TCL	mIoU	22.4	# 3	Compare
Open Vocabulary Semantic Segmentation	PASCAL Context-59	TCL	mIoU	33.9	# 16	Compare
Unsupervised Semantic Segmentation with Language-image Pre-training	PASCAL Context-59	TCL	mIoU	33.9	# 3	Compare
Unsupervised Semantic Segmentation with Language-image Pre-training	PASCAL VOC	TCL	mIoU	55.0	# 3	Compare
Unsupervised Semantic Segmentation with Language-image Pre-training	PascalVOC-20	TCL	mIoU	83.2	# 2	Compare
Open Vocabulary Semantic Segmentation	PascalVOC-20	TCL	mIoU	83.2	# 12	Compare

Methods

Add Remove

ALIGN • Contrastive Learning

Edit Social Preview

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove