TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Referring Expression Segmentation	A2D Sentences	VLIDE	Precision@0.5	0.702	# 10
Referring Expression Segmentation	A2D Sentences	VLIDE	Precision@0.9	0.151	# 8
Referring Expression Segmentation	A2D Sentences	VLIDE	IoU overall	0.714	# 7
Referring Expression Segmentation	A2D Sentences	VLIDE	IoU mean	0.598	# 10
Referring Expression Segmentation	A2D Sentences	VLIDE	Precision@0.6	0.663	# 9
Referring Expression Segmentation	A2D Sentences	VLIDE	Precision@0.7	0.585	# 8
Referring Expression Segmentation	A2D Sentences	VLIDE	Precision@0.8	0.428	# 8
Referring Expression Segmentation	A2D Sentences	VLIDE	AP	0.469	# 6
Referring Expression Segmentation	J-HMDB	VLIDE	Precision@0.5	0.874	# 7
Referring Expression Segmentation	J-HMDB	VLIDE	Precision@0.6	0.791	# 7
Referring Expression Segmentation	J-HMDB	VLIDE	Precision@0.7	0.586	# 5
Referring Expression Segmentation	J-HMDB	VLIDE	Precision@0.8	0.182	# 3
Referring Expression Segmentation	J-HMDB	VLIDE	Precision@0.9	0.30	# 2
Referring Expression Segmentation	J-HMDB	VLIDE	AP	0.441	# 3
Referring Expression Segmentation	J-HMDB	VLIDE	IoU overall	0.68	# 5
Referring Expression Segmentation	J-HMDB	VLIDE	IoU mean	0.666	# 6
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	VLIDE	J&F	49.56	# 24
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	VLIDE	J	48.44	# 23
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	VLIDE	F	50.67	# 23

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deeply-interleaved-two-stream-encoder-for/referring-expression-segmentation-on-j-hmdb)](https://paperswithcode.com/sota/referring-expression-segmentation-on-j-hmdb?p=deeply-interleaved-two-stream-encoder-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deeply-interleaved-two-stream-encoder-for/referring-expression-segmentation-on-a2d)](https://paperswithcode.com/sota/referring-expression-segmentation-on-a2d?p=deeply-interleaved-two-stream-encoder-for)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deeply-interleaved-two-stream-encoder-for/referring-expression-segmentation-on-refer-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refer-1?p=deeply-interleaved-two-stream-encoder-for)`

Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation

30 Mar 2022 · Guang Feng, Lihe Zhang, Zhiwei Hu, Huchuan Lu ·

Referring video segmentation aims to segment the corresponding video object described by the language expression. To address this task, we first design a two-stream encoder to extract CNN-based visual features and transformer-based linguistic features hierarchically, and a vision-language mutual guidance (VLMG) module is inserted into the encoder multiple times to promote the hierarchical and progressive fusion of multi-modal features. Compared with the existing multi-modal fusion methods, this two-stream encoder takes into account the multi-granularity linguistic context, and realizes the deep interleaving between modalities with the help of VLGM. In order to promote the temporal alignment between frames, we further propose a language-guided multi-scale dynamic filtering (LMDF) module to strengthen the temporal coherence, which uses the language-guided spatial-temporal features to generate a set of position-specific dynamic filters to more flexibly and effectively update the feature of current frame. Extensive experiments on four datasets verify the effectiveness of the proposed model.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Referring Expression Segmentation

Video Segmentation

Video Semantic Segmentation

Vocal Bursts Valence Prediction

Datasets

DAVIS

JHMDB

Referring Expressions for DAVIS 2016 & 2017

A2D

Refer-YouTube-VOS

A2D Sentences

Results from the Paper

Edit

Ranked #3 on Referring Expression Segmentation on J-HMDB

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Referring Expression Segmentation	A2D Sentences	VLIDE	Precision@0.5	0.702	# 10	Compare
			Precision@0.9	0.151	# 8	Compare
			IoU overall	0.714	# 7	Compare
			IoU mean	0.598	# 10	Compare
			Precision@0.6	0.663	# 9	Compare
			Precision@0.7	0.585	# 8	Compare
			Precision@0.8	0.428	# 8	Compare
			AP	0.469	# 6	Compare
Referring Expression Segmentation	J-HMDB	VLIDE	Precision@0.5	0.874	# 7	Compare
			Precision@0.6	0.791	# 7	Compare
			Precision@0.7	0.586	# 5	Compare
			Precision@0.8	0.182	# 3	Compare
			Precision@0.9	0.30	# 2	Compare
			AP	0.441	# 3	Compare
			IoU overall	0.68	# 5	Compare
			IoU mean	0.666	# 6	Compare
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	VLIDE	J&F	49.56	# 24	Compare
			J	48.44	# 23	Compare
			F	50.67	# 23	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Deeply Interleaved Two-Stream Encoder for Referring Video Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove