One-shot visual object segmentation
27 papers with code • 1 benchmarks • 1 datasets
We address semi-supervised video object segmentation, the task of automatically generating accurate and consistent pixel masks for objects in a video sequence, given the first-frame ground truth annotations.
End-to-end sequential learning to explore spatial-temporal features for video segmentation is largely limited by the scale of available video segmentation datasets, i. e., even the largest video segmentation dataset only contains 90 short video clips.
In our framework, the past frames with object masks form an external memory, and the current frame as the query is segmented using the mask information in the memory.
The target appearance model consists of a light-weight module, which is learned during the inference stage using fast optimization techniques to predict a coarse but robust target segmentation.
This paper investigates the principles of embedding learning to tackle the challenging semi-supervised video object segmentation.
This allows us to achieve a rich internal representation of the target in the current frame, significantly increasing the segmentation accuracy of our approach.
The state-of-the-art methods learn to decode features with a single positive object and thus have to match and segment each target separately under multi-object scenarios, consuming multiple times computing resources.
One of the fundamental challenges in video object segmentation is to find an effective representation of the target and background appearance.