TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Referring Expression Segmentation	A2D Sentences	CMPC-V (I3D)	Precision@0.5	0.655	# 12
Referring Expression Segmentation	A2D Sentences	CMPC-V (I3D)	Precision@0.9	0.098	# 13
Referring Expression Segmentation	A2D Sentences	CMPC-V (I3D)	IoU overall	0.653	# 15
Referring Expression Segmentation	A2D Sentences	CMPC-V (I3D)	IoU mean	0.573	# 12
Referring Expression Segmentation	A2D Sentences	CMPC-V (I3D)	Precision@0.6	0.592	# 13
Referring Expression Segmentation	A2D Sentences	CMPC-V (I3D)	Precision@0.7	0.506	# 13
Referring Expression Segmentation	A2D Sentences	CMPC-V (I3D)	Precision@0.8	0.342	# 12
Referring Expression Segmentation	A2D Sentences	CMPC-V (I3D)	AP	0.404	# 11
Referring Expression Segmentation	A2D Sentences	CMPC-V (R2D)	Precision@0.5	0.590	# 18
Referring Expression Segmentation	A2D Sentences	CMPC-V (R2D)	Precision@0.9	0.068	# 17
Referring Expression Segmentation	A2D Sentences	CMPC-V (R2D)	IoU overall	0.649	# 16
Referring Expression Segmentation	A2D Sentences	CMPC-V (R2D)	IoU mean	0.515	# 19
Referring Expression Segmentation	A2D Sentences	CMPC-V (R2D)	Precision@0.6	0.527	# 18
Referring Expression Segmentation	A2D Sentences	CMPC-V (R2D)	Precision@0.7	0.434	# 18
Referring Expression Segmentation	A2D Sentences	CMPC-V (R2D)	Precision@0.8	0.284	# 18
Referring Expression Segmentation	A2D Sentences	CMPC-V (R2D)	AP	0.351	# 15
Referring Expression Segmentation	J-HMDB	CMPC-V	Precision@0.5	0.813	# 9
Referring Expression Segmentation	J-HMDB	CMPC-V	Precision@0.6	0.657	# 10
Referring Expression Segmentation	J-HMDB	CMPC-V	Precision@0.7	0.371	# 12
Referring Expression Segmentation	J-HMDB	CMPC-V	Precision@0.8	0.07	# 12
Referring Expression Segmentation	J-HMDB	CMPC-V	Precision@0.9	0.000	# 11
Referring Expression Segmentation	J-HMDB	CMPC-V	AP	0.342	# 7
Referring Expression Segmentation	J-HMDB	CMPC-V	IoU overall	0.616	# 10
Referring Expression Segmentation	J-HMDB	CMPC-V	IoU mean	0.617	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-progressive-comprehension-for/referring-expression-segmentation-on-j-hmdb)](https://paperswithcode.com/sota/referring-expression-segmentation-on-j-hmdb?p=cross-modal-progressive-comprehension-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/cross-modal-progressive-comprehension-for/referring-expression-segmentation-on-a2d)](https://paperswithcode.com/sota/referring-expression-segmentation-on-a2d?p=cross-modal-progressive-comprehension-for)`

Cross-Modal Progressive Comprehension for Referring Segmentation

15 May 2021 · Si Liu, Tianrui Hui, Shaofei Huang, Yunchao Wei, Bo Li, Guanbin Li ·

Given a natural language expression and an image/video, the goal of referring segmentation is to produce the pixel-level masks of the entities described by the subject of the expression. Previous approaches tackle this problem by implicit feature interaction and fusion between visual and linguistic modalities in a one-stage manner. However, human tends to solve the referring problem in a progressive manner based on informative words in the expression, i.e., first roughly locating candidate entities and then distinguishing the target one. In this paper, we propose a Cross-Modal Progressive Comprehension (CMPC) scheme to effectively mimic human behaviors and implement it as a CMPC-I (Image) module and a CMPC-V (Video) module to improve referring image and video segmentation models. For image data, our CMPC-I module first employs entity and attribute words to perceive all the related entities that might be considered by the expression. Then, the relational words are adopted to highlight the target entity as well as suppress other irrelevant ones by spatial graph reasoning. For video data, our CMPC-V module further exploits action words based on CMPC-I to highlight the correct entity matched with the action cues by temporal graph reasoning. In addition to the CMPC, we also introduce a simple yet effective Text-Guided Feature Exchange (TGFE) module to integrate the reasoned multimodal features corresponding to different levels in the visual backbone under the guidance of textual information. In this way, multi-level features can communicate with each other and be mutually refined based on the textual context. Combining CMPC-I or CMPC-V with TGFE can form our image or video version referring segmentation frameworks and our frameworks achieve new state-of-the-art performances on four referring image segmentation benchmarks and three referring video segmentation benchmarks respectively.

PDF Abstract

Code

Add Remove Mark official

spyflying/CMPC-Refseg official

Tasks

Add Remove

Attribute

Image Segmentation

Referring Expression Segmentation

Segmentation

Semantic Segmentation

Video Segmentation

Video Semantic Segmentation

Datasets

MS COCO

JHMDB

A2D

Refer-YouTube-VOS

A2D Sentences

Results from the Paper

Add Remove

Ranked #7 on Referring Expression Segmentation on J-HMDB

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Referring Expression Segmentation	A2D Sentences	CMPC-V (I3D)	Precision@0.5	0.655	# 12	Compare
			Precision@0.9	0.098	# 13	Compare
			IoU overall	0.653	# 15	Compare
			IoU mean	0.573	# 12	Compare
			Precision@0.6	0.592	# 13	Compare
			Precision@0.7	0.506	# 13	Compare
			Precision@0.8	0.342	# 12	Compare
			AP	0.404	# 11	Compare
Referring Expression Segmentation	A2D Sentences	CMPC-V (R2D)	Precision@0.5	0.590	# 18	Compare
			Precision@0.9	0.068	# 17	Compare
			IoU overall	0.649	# 16	Compare
			IoU mean	0.515	# 19	Compare
			Precision@0.6	0.527	# 18	Compare
			Precision@0.7	0.434	# 18	Compare
			Precision@0.8	0.284	# 18	Compare
			AP	0.351	# 15	Compare
Referring Expression Segmentation	J-HMDB	CMPC-V	Precision@0.5	0.813	# 9	Compare
			Precision@0.6	0.657	# 10	Compare
			Precision@0.7	0.371	# 12	Compare
			Precision@0.8	0.07	# 12	Compare
			Precision@0.9	0.000	# 11	Compare
			AP	0.342	# 7	Compare
			IoU overall	0.616	# 10	Compare
			IoU mean	0.617	# 9	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Cross-Modal Progressive Comprehension for Referring Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove