Pixel-Level Matching for Video Object Segmentation using Convolutional Neural Networks

We propose a novel video object segmentation algorithm based on pixel-level matching using Convolutional Neural Networks (CNN). Our network aims to distinguish the target area from the background on the basis of the pixel-level similarity between two object units. The proposed network represents a target object using features from different depth layers in order to take advantage of both the spatial details and the category-level semantic information. Furthermore, we propose a feature compression technique that drastically reduces the memory requirements while maintaining the capability of feature representation. Two-stage training (pre-training and fine-tuning) allows our network to handle any target object regardless of its category (even if the object's type does not belong to the pre-training data) or of variations in its appearance through a video sequence. Experiments on large datasets demonstrate the effectiveness of our model - against related methods - in terms of accuracy, speed, and stability. Finally, we introduce the transferability of our network to different domains, such as the infrared data domain.

PDF Abstract ICCV 2017 PDF ICCV 2017 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Semi-Supervised Video Object Segmentation DAVIS 2016 PLM Jaccard (Mean) 70.2 # 71
Jaccard (Recall) 86.3 # 29
Jaccard (Decay) 11.2 # 10
F-measure (Mean) 62.5 # 74
F-measure (Recall) 73.2 # 29
F-measure (Decay) 14.7 # 5
J&F 66.35 # 73

Methods


No methods listed for this paper. Add relevant methods here