TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Unsupervised Video Object Segmentation	DAVIS 2016 val	TransportNet	G	84.8	# 11
Unsupervised Video Object Segmentation	DAVIS 2016 val	TransportNet	J	84.5	# 10
Unsupervised Video Object Segmentation	DAVIS 2016 val	TransportNet	F	85.0	# 10
Unsupervised Video Object Segmentation	FBMS test	TransportNet	J	78.7	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-transport-network-for-unsupervised-video/unsupervised-video-object-segmentation-on-11)](https://paperswithcode.com/sota/unsupervised-video-object-segmentation-on-11?p=deep-transport-network-for-unsupervised-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/deep-transport-network-for-unsupervised-video/unsupervised-video-object-segmentation-on-10)](https://paperswithcode.com/sota/unsupervised-video-object-segmentation-on-10?p=deep-transport-network-for-unsupervised-video)`

Deep Transport Network for Unsupervised Video Object Segmentation

ICCV 2021 · Kaihua Zhang, Zicheng Zhao, Dong Liu, Qingshan Liu, Bo Liu ·

The popular unsupervised video object segmentation methods fuse the RGB frame and optical flow via a two-stream network. However, they cannot handle the distracting noises in each input modality, which may vastly deteriorate the model performance. We propose to establish the correspondence between the input modalities while suppressing the distracting signals via optimal structural matching. Given a video frame, we extract the dense local features from the RGB image and optical flow, and treat them as two complex structured representations. The Wasserstein distance is then employed to compute the global optimal flows to transport the features in one modality to the other, where the magnitude of each flow measures the extent of the alignment between two local features. To plug the structural matching into a two-stream network for end-to-end training, we factorize the input cost matrix into small spatial blocks and design a differentiable long-short Sinkhorn module consisting of a long-distant Sinkhorn layer and a short-distant Sinkhorn layer. We integrate the module into a dedicated two-stream network and dub our model TransportNet. Our experiments show that aligning motion-appearance yields the state-of-the-art results on the popular video object segmentation datasets.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Object

Optical Flow Estimation

Semantic Segmentation

Unsupervised Video Object Segmentation

Video Object Segmentation

Video Semantic Segmentation

Datasets

DAVIS

DAVIS 2016

FBMS

Results from the Paper

Add Remove

Ranked #4 on Unsupervised Video Object Segmentation on FBMS test

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Unsupervised Video Object Segmentation	DAVIS 2016 val	TransportNet	G	84.8	# 11	Compare
			J	84.5	# 10	Compare
			F	85.0	# 10	Compare
Unsupervised Video Object Segmentation	FBMS test	TransportNet	J	78.7	# 4	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Deep Transport Network for Unsupervised Video Object Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove