The semi-supervised scenario assumes the user inputs a full mask of the object of interest in the first frame of a video sequence. Methods have to produce the segmentation mask for that object in the subsequent frames.
In this paper we illustrate how to perform both visual object tracking and semi-supervised video object segmentation, in real-time, with a single simple approach.
#3 best model for Visual Object Tracking on YouTube-VOS
This paper tackles the task of semi-supervised video object segmentation, i. e., the separation of an object from the background in a video, given the mask of the first frame.
Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence.
Specifically, to integrate the insights of matching based and propagation based methods, we employ an encoder-decoder framework to learn pixel-level similarity and segmentation in an end-to-end manner.
SOTA for Visual Object Tracking on DAVIS 2016
We validate our method on four benchmark sets that cover single and multiple object segmentation.
#7 best model for Visual Object Tracking on DAVIS 2016
We address semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations.
In our framework, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory.
Semi-supervised video object segmentation has made significant progress on real and challenging videos in recent years.