Multiple object video object segmentation is a challenging task, specially for the zero-shot case, when no object mask is given at the initial frame and the model has to find the objects to be segmented along the sequence.
This paper conducts a systematic study on the role of visual attention in Unsupervised Video Object Segmentation (UVOS) tasks.
We introduce a novel network, called CO-attention Siamese Network (COSNet), to address the unsupervised video object segmentation task from a holistic view.
This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM).
The detection of moving objects is done in an unsupervised way by exploiting structure from motion.
Specifically, we present an effective video saliency detector that consists of a spatial refinement network and a spatiotemporal module.
We investigate the problem of strictly unsupervised video object segmentation, i. e., the separation of a primary object from background in video without a user-provided object mask or any training on an annotated dataset.
Moreover, our method achieves better performance than the best unsupervised offline algorithm on the DAVIS-2016 benchmark dataset.