In particular, we 1) propose a bifurcated backbone strategy (BBS) to split the multi-level features into teacher and student features, and 2) utilize a depth-enhanced module (DEM) to excavate informative parts of depth cues from the channel and spatial views.
Deep neural networks can be roughly divided into deterministic neural networks and stochastic neural networks. The former is usually trained to achieve a mapping from input space to output space via maximum likelihood estimation for the weights, which leads to deterministic predictions during testing.
In this paper, we introduce a novel multi-stage cascaded learning framework via mutual information minimization to "explicitly" model the multi-modal information between RGB image and depth data.
A cross-enhanced integration module (CIM) is proposed to fuse cross-modal features in the shared learning network, which are then propagated to the next layer for integrating cross-level information.
Among these, the CFM is used to collect the semantic and location information of polyps from high-level features, while the CIM is applied to capture polyp information disguised in low-level features.
Ranked #2 on Medical Image Segmentation on CVC-ColonDB
Previous video object segmentation approaches mainly focus on using simplex solutions between appearance and motion, limiting feature collaboration efficiency among and across these two cues.
With this goal in mind, we propose PV-SOD, a new task that aims to segment salient objects from panoramic videos.
We hope this work will facilitate state-of-the-art Transformer researches in computer vision.
Ranked #24 on Object Detection on COCO minival
We address this problem with the use of a novel Probabilistic Model Distillation (PMD) approach which transfers knowledge learned by a probabilistic teacher model on synthetic data to a static student model with the use of unlabeled real image pairs.
Thanks to the rapid advances in the deep learning techniques and the wide availability of large-scale training sets, the performances of video saliency detection models have been improving steadily and significantly.
Existing video polyp segmentation (VPS) models typically employ convolutional neural networks (CNNs) to extract features.
This design bias has led to a saturation in performance for state-of-the-art SOD models when evaluated on existing datasets.
In this paper, we strive to embrace challenges towards effective and efficient COS. To this end, we develop a bio-inspired framework, termed Positioning and Focus Network (PFNet), which mimics the process of predation in nature.
Extensive experimental results on various SOD and COD tasks illustrate that transformer networks can transform SOD and COD, leading to new benchmarks for each related task.
Automatically detecting/segmenting object(s) that blend in with their surroundings is difficult for current models.
We present a new vision-language (VL) pre-training model dubbed Kaleido-BERT, which introduces a novel kaleido strategy for fashion cross-modality representations from transformers.
We present a novel group collaborative learning framework (GCoNet) capable of detecting co-salient objects in real time (16ms), by simultaneously mining consensus representations at group level based on the two necessary criteria: 1) intra-group compactness to better formulate the consistency among co-salient objects by capturing their inherent shared attributes using our novel group affinity module; 2) inter-group separability to effectively suppress the influence of noisy objects on the output by introducing our new group collaborating module conditioning the inconsistent consensus.
With the above understanding about camouflaged objects, we present the first ranking based COD network (Rank-Net) to simultaneously localize, segment and rank camouflaged objects.
Unlike the recently-proposed Transformer model (e. g., ViT) that is specially designed for image classification, we propose Pyramid Vision Transformer~(PVT), which overcomes the difficulties of porting Transformer to various dense prediction tasks.
Ranked #66 on Object Detection on COCO minival
We present the first systematic study on concealed object detection (COD), which aims to identify objects that are "perfectly" embedded in their background.
Ranked #1 on Camouflaged Object Segmentation on COD
2) Based on the DFA features, we introduce the integrity channel enhancement (ICE) component with the goal of enhancing feature channels that highlight the integral salient objects at the macro level, while suppressing the other distracting ones.
In this paper, we propose a simple yet powerful Boundary-Aware Segmentation Network (BASNet), which comprises a predict-refine architecture and a hybrid loss, for highly accurate image segmentation.
Spotting objects that are visually adapted to their surroundings is challenging for both humans and AI.
Secondly, we benchmark nine representative light field SOD models together with several cutting-edge RGB-D SOD models on four widely used light field datasets, from which insightful discussions and analyses, including a comparison between light field SOD and RGB-D SOD models, are achieved.
Our framework includes two main models: 1) a generator model, which maps the input image and latent variable to stochastic saliency prediction, and 2) an inference model, which gradually updates the latent variable by sampling it from the true or approximate posterior distribution.
Ranked #1 on RGB Salient Object Detection on ECSSD
Inspired by the observation that RGB and depth modalities actually present certain commonality in distinguishing salient objects, a novel joint learning and densely cooperative fusion (JL-DCF) architecture is designed to learn from both RGB and depth inputs through a shared network backbone, known as the Siamese architecture.
Ranked #2 on RGB-D Salient Object Detection on SIP (using extra training data)
Further, considering that the light field can also provide depth maps, we review SOD models and popular benchmark datasets from this domain as well.
CoSOD is an emerging and rapidly growing extension of salient object detection (SOD), which aims to detect the co-occurring salient objects in a group of images.
Ranked #3 on Co-Salient Object Detection on CoCA
In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS).
Ranked #1 on RGB-D Salient Object Detection on NJU2K
To address these challenges, we propose a parallel reverse attention network (PraNet) for accurate polyp segmentation in colonoscopy images.
Ranked #3 on Camouflaged Object Segmentation on CAMO (using extra training data)
We present a comprehensive study on a new task named camouflaged object detection (COD), which aims to identify objects that are "seamlessly" embedded in their surroundings.
Ranked #2 on Camouflaged Object Segmentation on COD
Co-salient object detection (CoSOD) is a newly emerging and rapidly growing branch of salient object detection (SOD), which aims to detect the co-occurring salient objects in multiple images.
Ranked #1 on Co-Salient Object Detection on CoSOD3k
To better explore salient information in both foreground and background regions, this paper proposes a Bilateral Attention Network (BiANet) for the RGB-D SOD task.
Ranked #3 on RGB-D Salient Object Detection on SIP
Coronavirus Disease 2019 (COVID-19) spread globally in early 2020, causing the world to face an existential health crisis.
This paper proposes a novel joint learning and densely-cooperative fusion (JL-DCF) architecture for RGB-D salient object detection.
Ranked #5 on RGB-D Salient Object Detection on NLPR
The chest CT scan test provides a valuable complementary tool to the RT-PCR test, and it can identify the patients in the early-stage with high sensitivity.
In this paper, we propose the first framework (UCNet) to employ uncertainty for RGB-D saliency detection by learning from the data labeling process.
Ranked #4 on RGB-D Salient Object Detection on LFSD
In this paper, we design a perceptual metric, called Structure Co-Occurrence Texture (Scoot), which simultaneously considers the block-level spatial structure and co-occurrence texture statistics.
The use of RGB-D information for salient object detection has been extensively explored in recent years.
Ranked #4 on RGB-D Salient Object Detection on SSD
This is the first work that explicitly emphasizes the challenge of saliency shift, i. e., the video salient object(s) may dynamically change.
Ranked #1 on Video Salient Object Detection on DAVSOD-easy35
We design a novel Local Spatial Aware (LSA) layer, which can learn to generate Spatial Distribution Weights (SDWs) hierarchically based on the spatial relationship in local region for spatial independent operations, to establish the relationship between these operations and spatial distribution, thus capturing the local geometric structure sensitively. We further propose the LSANet, which is based on LSA layer, aggregating the spatial information with associated features in each layer of the network better in network design. The experiments show that our LSANet can achieve on par or better performance than the state-of-the-art methods when evaluating on the challenging benchmark datasets.
The existing binary foreground map (FM) measures to address various types of errors in either pixel-wise or structural ways.
However, human perception of the similarity of two sketches will consider both structure and texture as essential factors and is not sensitive to slight ("pixel-level") mismatches.
Semantic edge detection (SED), which aims at jointly extracting edges as well as their category information, has far-reaching applications in domains such as semantic segmentation, object proposal generation, and object recognition.
Our analysis identifies a serious design bias of existing SOD datasets which assumes that each image contains at least one clearly outstanding salient object in low clutter.
Our new measure simultaneously evaluates region-aware and object-aware structural similarity between a SM and a GT map.