Previous video object segmentation approaches mainly focus on using simplex solutions between appearance and motion, limiting feature collaboration efficiency among and across these two cues.
To tackle this dilemma and also inspired by the fact that depth quality is a key factor influencing the accuracy, we propose a novel depth quality-inspired feature manipulation (DQFM) process, which is efficient itself and can serve as a gating mechanism for filtering depth features to greatly boost the accuracy.
Depth information has been proved beneficial in RGB-D salient object detection (SOD).
Ranked #1 on RGB-D Salient Object Detection on NJU2K
The proposed model, named RD3D, aims at pre-fusion in the encoder stage and in-depth fusion in the decoder stage to effectively promote the full integration of RGB and depth streams.
Finally, we propose an effective layer-wise aggregation module to fuse the features extracted from the enhanced depth maps and RGB images for the accurate detection of salient objects.
Secondly, we benchmark nine representative light field SOD models together with several cutting-edge RGB-D SOD models on four widely used light field datasets, from which insightful discussions and analyses, including a comparison between light field SOD and RGB-D SOD models, are achieved.
Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion (JL-DCF) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture.
Ranked #2 on RGB-D Salient Object Detection on SIP (using extra training data)
This paper proposes a novel joint learning and densely-cooperative fusion (JL-DCF) architecture for RGB-D salient object detection.
Ranked #5 on RGB-D Salient Object Detection on NLPR
To improve the image quality, we propose an effective many-to-many mapping framework for unsupervised multi-domain image-to-image translation.
It utilizes both the foreground and background information, and imposes a local coordinate constraint, where the basis matrix is sparse matrix from the linear combination of candidates with corresponding nonnegative coefficient vectors.
In the teaching-to-learn step, a teacher is designed to arrange the regions from simple to difficult and then assign the simplest regions to the learner.