TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Referring Expression Segmentation	A2D Sentences	AAMN	Precision@0.5	0.681	# 11
Referring Expression Segmentation	A2D Sentences	AAMN	Precision@0.9	0.029	# 22
Referring Expression Segmentation	A2D Sentences	AAMN	IoU overall	0.617	# 20
Referring Expression Segmentation	A2D Sentences	AAMN	IoU mean	0.552	# 15
Referring Expression Segmentation	A2D Sentences	AAMN	Precision@0.6	0.629	# 11
Referring Expression Segmentation	A2D Sentences	AAMN	Precision@0.7	0.523	# 11
Referring Expression Segmentation	A2D Sentences	AAMN	Precision@0.8	0.296	# 17
Referring Expression Segmentation	A2D Sentences	AAMN	AP	0.396	# 13
Referring Expression Segmentation	J-HMDB	AAMN	Precision@0.5	0.773	# 11
Referring Expression Segmentation	J-HMDB	AAMN	Precision@0.6	0.627	# 12
Referring Expression Segmentation	J-HMDB	AAMN	Precision@0.7	0.360	# 13
Referring Expression Segmentation	J-HMDB	AAMN	Precision@0.8	0.044	# 17
Referring Expression Segmentation	J-HMDB	AAMN	Precision@0.9	0.000	# 11
Referring Expression Segmentation	J-HMDB	AAMN	AP	0.321	# 9
Referring Expression Segmentation	J-HMDB	AAMN	IoU overall	0.583	# 13
Referring Expression Segmentation	J-HMDB	AAMN	IoU mean	0.576	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/actor-and-action-modular-network-for-text/referring-expression-segmentation-on-j-hmdb)](https://paperswithcode.com/sota/referring-expression-segmentation-on-j-hmdb?p=actor-and-action-modular-network-for-text)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/actor-and-action-modular-network-for-text/referring-expression-segmentation-on-a2d)](https://paperswithcode.com/sota/referring-expression-segmentation-on-a2d?p=actor-and-action-modular-network-for-text)`

Actor and Action Modular Network for Text-based Video Segmentation

2 Nov 2020 · Jianhua Yang, Yan Huang, Kai Niu, Linjiang Huang, Zhanyu Ma, Liang Wang ·

Text-based video segmentation aims to segment an actor in video sequences by specifying the actor and its performing action with a textual query. Previous methods fail to explicitly align the video content with the textual query in a fine-grained manner according to the actor and its action, due to the problem of \emph{semantic asymmetry}. The \emph{semantic asymmetry} implies that two modalities contain different amounts of semantic information during the multi-modal fusion process. To alleviate this problem, we propose a novel actor and action modular network that individually localizes the actor and its action in two separate modules. Specifically, we first learn the actor-/action-related content from the video and textual query, and then match them in a symmetrical manner to localize the target tube. The target tube contains the desired actor and action which is then fed into a fully convolutional network to predict segmentation masks of the actor. Our method also establishes the association of objects cross multiple frames with the proposed temporal proposal aggregation mechanism. This enables our method to segment the video effectively and keep the temporal consistency of predictions. The whole model is allowed for joint learning of the actor-action matching and segmentation, as well as achieves the state-of-the-art performance for both single-frame segmentation and full video segmentation on A2D Sentences and J-HMDB Sentences datasets.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Action Segmentation

Action Understanding

Referring Expression Segmentation

Segmentation

Semantic Segmentation

Video Segmentation

Video Semantic Segmentation

Datasets

MS COCO

JHMDB

A2D

A2D Sentences

Results from the Paper

Edit

Ranked #9 on Referring Expression Segmentation on J-HMDB

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Referring Expression Segmentation	A2D Sentences	AAMN	Precision@0.5	0.681	# 11	Compare
			Precision@0.9	0.029	# 22	Compare
			IoU overall	0.617	# 20	Compare
			IoU mean	0.552	# 15	Compare
			Precision@0.6	0.629	# 11	Compare
			Precision@0.7	0.523	# 11	Compare
			Precision@0.8	0.296	# 17	Compare
			AP	0.396	# 13	Compare
Referring Expression Segmentation	J-HMDB	AAMN	Precision@0.5	0.773	# 11	Compare
			Precision@0.6	0.627	# 12	Compare
			Precision@0.7	0.360	# 13	Compare
			Precision@0.8	0.044	# 17	Compare
			Precision@0.9	0.000	# 11	Compare
			AP	0.321	# 9	Compare
			IoU overall	0.583	# 13	Compare
			IoU mean	0.576	# 13	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Actor and Action Modular Network for Text-based Video Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove