TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Referring Expression Segmentation	A2D Sentences	CMDy	Precision@0.5	0.607	# 17
Referring Expression Segmentation	A2D Sentences	CMDy	Precision@0.9	0.045	# 20
Referring Expression Segmentation	A2D Sentences	CMDy	IoU overall	0.623	# 18
Referring Expression Segmentation	A2D Sentences	CMDy	IoU mean	0.531	# 16
Referring Expression Segmentation	A2D Sentences	CMDy	Precision@0.6	0.525	# 19
Referring Expression Segmentation	A2D Sentences	CMDy	Precision@0.7	0.405	# 19
Referring Expression Segmentation	A2D Sentences	CMDy	Precision@0.8	0.235	# 19
Referring Expression Segmentation	A2D Sentences	CMDy	AP	0.333	# 16
Referring Expression Segmentation	J-HMDB	CMDy	Precision@0.5	0.742	# 14
Referring Expression Segmentation	J-HMDB	CMDy	Precision@0.6	0.587	# 15
Referring Expression Segmentation	J-HMDB	CMDy	Precision@0.7	0.316	# 15
Referring Expression Segmentation	J-HMDB	CMDy	Precision@0.8	0.047	# 16
Referring Expression Segmentation	J-HMDB	CMDy	Precision@0.9	0.000	# 11
Referring Expression Segmentation	J-HMDB	CMDy	AP	0.301	# 10
Referring Expression Segmentation	J-HMDB	CMDy	IoU overall	0.554	# 16
Referring Expression Segmentation	J-HMDB	CMDy	IoU mean	0.576	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/context-modulated-dynamic-networks-for-actor/referring-expression-segmentation-on-j-hmdb)](https://paperswithcode.com/sota/referring-expression-segmentation-on-j-hmdb?p=context-modulated-dynamic-networks-for-actor)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/context-modulated-dynamic-networks-for-actor/referring-expression-segmentation-on-a2d)](https://paperswithcode.com/sota/referring-expression-segmentation-on-a2d?p=context-modulated-dynamic-networks-for-actor)`

Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries

3 Apr 2020 · Hao Wang, Cheng Deng, Fan Ma, Yi Yang ·

Actor and action video segmentation with language queries aims to segment out the expression referred objects in the video. This process requires comprehensive language reasoning and fine-grained video understanding. Previous methods mainly leverage dynamic convolutional networks to match visual and semantic representations. However, the dynamic convolution neglects spatial context when processing each region in the frame and is thus challenging to segment similar objects in the complex scenarios. To address such limitation, we construct a context modulated dynamic convolutional network. Specifically, we propose a context modulated dynamic convolutional operation in the proposed framework. The kernels for the specific region are generated from both language sentences and surrounding context features. Moreover, we devise a temporal encoder to incorporate motions into the visual features to further match the query descriptions. Extensive experiments on two benchmark datasets, Actor-Action Dataset Sentences (A2D Sentences) and J-HMDB Sentences, demonstrate that our proposed approach notably outperforms state-of-the-art methods.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Referring Expression Segmentation

Video Segmentation

Video Semantic Segmentation

Video Understanding

Datasets

JHMDB

A2D Sentences

Results from the Paper

Add Remove

Ranked #10 on Referring Expression Segmentation on J-HMDB

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Referring Expression Segmentation	A2D Sentences	CMDy	Precision@0.5	0.607	# 17	Compare
			Precision@0.9	0.045	# 20	Compare
			IoU overall	0.623	# 18	Compare
			IoU mean	0.531	# 16	Compare
			Precision@0.6	0.525	# 19	Compare
			Precision@0.7	0.405	# 19	Compare
			Precision@0.8	0.235	# 19	Compare
			AP	0.333	# 16	Compare
Referring Expression Segmentation	J-HMDB	CMDy	Precision@0.5	0.742	# 14	Compare
			Precision@0.6	0.587	# 15	Compare
			Precision@0.7	0.316	# 15	Compare
			Precision@0.8	0.047	# 16	Compare
			Precision@0.9	0.000	# 11	Compare
			AP	0.301	# 10	Compare
			IoU overall	0.554	# 16	Compare
			IoU mean	0.576	# 13	Compare

Methods

Add Remove

Dynamic Convolution

Edit Social Preview

Context Modulated Dynamic Networks for Actor and Action Video Segmentation with Language Queries

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove