Salient object detection (SOD) and camouflaged object detection (COD) are related yet distinct binary mapping tasks.
Applying NeRF to downstream perception tasks for scene understanding and representation is becoming increasingly popular.
A surge of interest has emerged in weakly supervised semantic segmentation due to its remarkable efficiency in recent years.
We set up novel evaluation benchmarks based on a series of testing sets with evolving distributions.
Ranked #62 on Long-tail Learning on CIFAR-100-LT (ρ=100)
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models by giving Saliency Prompt for queries/kernels.
It can model the feature space more comprehensively and reduce the dominance of head classes.
Most existing methods that cope with noisy labels usually assume that the class distributions are well balanced, which has insufficient capacity to deal with the practical scenarios where training samples have imbalanced distributions.
In CTN, a transformer module is constructed in parallel to a U-Net to learn long-distance dependencies between different anatomical regions; and these dependencies are communicated to the U-Net at multiple stages to endow it with global awareness.
Although deep learning algorithms have been intensively developed for computer-aided tuberculosis diagnosis (CTD), they mainly depend on carefully annotated datasets, leading to much time and resource consumption.
To tackle this issue, we make an early effort to study temporal action localization from the perspective of multi-modality feature learning, based on the observation that different actions exhibit specific preferences to appearance or motion modality.
This design makes the presented transformer model a hybrid of 1) top-down and bottom-up attention pathways and 2) dynamic and static routing pathways.
Improving the resolution of magnetic resonance (MR) image data is critical to computer-aided diagnosis and brain function analysis.
To properly address this problem, we propose a novel density-variational learning framework to improve the robustness of the image dehzing model assisted by a variety of negative hazy images, to better deal with various complex hazy scenarios.
Semantic segmentation with limited annotations, such as weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS), is a challenging task that has attracted much attention recently.
Ranked #27 on Weakly-Supervised Semantic Segmentation on COCO 2014 val
Based on the exemplar-consultation mechanism, the long-term dependencies can be captured by regarding historical frames as exemplars, while the category-level modeling can be achieved by regarding representative frames from a category as exemplars.
Ranked #6 on Online Action Detection on TVSeries
Recent advances in machine learning and prevalence of digital medical images have opened up an opportunity to address the challenging brain tumor segmentation (BTS) task by using deep convolutional neural networks.
Zero-shot object detection aims at incorporating class semantic vectors to realize the detection of (both seen and) unseen classes given an unconstrained test image.
Ranked #2 on Zero-Shot Object Detection on PASCAL VOC'07
Under this circumstance, the models learned from different views can distill valuable knowledge to guide the learning processes of each other.
To this end, we make a pioneering effort to distill helpful knowledge from a heavy network model learned from high-resolution (HR) images to a compact network model that will handle LR images, thus advancing the current knowledge distillation technique with the novel pixel distillation.
Current weakly supervised semantic segmentation (WSSS) frameworks usually contain the separated mask-refinement model and the main semantic region mining model.
Weakly supervised temporal action localization aims at learning the instance-level action pattern from the video-level labels, where a significant challenge is action-context confusion.
On the other hand, instead of processing the twokinds of data separately, we build a novel dual graph modelto guide the focal stack fusion process using all-focus pat-terns.
In this paper, we propose a single image dehazing method with an independent Detail Recovery Network (DRN), which considers capturing the details from the input image over a separate network and then integrates them into a coarse dehazed image.
Weakly supervised object localization (WSOL) aims at learning to localize objects of interest by only using the image-level labels as the supervision.
Semantic segmentation models gain robustness against poor lighting conditions by virtue of complementary information from visible (RGB) and thermal images.
Ranked #23 on Thermal Image Segmentation on MFN Dataset
Leveraging on constant structure and disease relations extracted from domain knowledge, we propose a structure-aware relation network (SAR-Net) extending Mask R-CNN.
As an emerging and challenging problem in the computer vision community, weakly supervised object localization and detection plays an important role for developing new generation computer vision systems and has received significant attention in the past decade.
A fundamental challenge in training the existing deep saliency detection models is the requirement of large amounts of annotated data.
Onfocus detection aims at identifying whether the focus of the individual captured by a camera is on the camera or not.
To this end, this paper revisits the role of top-down modeling in salient object detection and designs a novel densely nested top-down flows (DNTDF)-based framework.
In this paper, we model the information fusion within focal stack via graph networks.
To address this problem, this paper proposes a novel anchor-free action localization module that assists action localization by temporal points.
The existing methods can be categorized into two localization-by-classification pipelines, i. e., the pre-classification pipeline and the post-classification pipeline.
In this way, even though the overall video saliency quality is heavily dependent on its spatial branch, however, the performance of the temporal branch still matter.
CoSOD is an emerging and rapidly growing extension of salient object detection (SOD), which aims to detect the co-occurring salient objects in a group of images.
Ranked #6 on Co-Salient Object Detection on CoCA
This "noisy" motion representation makes it very challenging for pose estimation and action recognition in real scenarios.
In this paper, we formulate this problem as a Markov Decision Process, where agents are learned to segment object regions under a deep reinforcement learning framework.
Specifically, DTR-GAN learns a dilated temporal relational generator and a discriminator with three-player loss in an adversarial manner.
Unsupervised video summarization plays an important role on digesting, browsing, and searching the ever-growing videos every day, and the underlying fine-grained semantic and motion information (i. e., objects of interest and their key motions) in online videos has been barely touched.
Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector.
Object segmentation in weakly labelled videos is an interesting yet challenging task, which aims at learning to perform category-specific video object segmentation by only using video-level tags.
Recently, researchers have made great processes to build category-specific 3D shape models from 2D images with manual annotations consisting of class labels, keypoints, and ground truth figure-ground segmentations.
Weakly-supervised object detection (WOD) is a challenging problems in computer vision.
Ranked #34 on Weakly Supervised Object Detection on PASCAL VOC 2007
Aiming at automatically discovering the common objects contained in a set of relevant images and segmenting them as foreground simultaneously, object co-segmentation has become an active research topic in recent years.
Co-saliency detection is a newly emerging and rapidly growing research area in computer vision community.
As an interesting and emerging topic, co-saliency detection aims at simultaneously extracting common salient objects in a group of images.
In the proposed framework, the wide and deep information are explored for the object proposal windows extracted in each image, and the co-saliency scores are calculated by integrating the intra-image contrast and intra group consistency via a principled Bayesian formulation.
It is believed that eye movements in free-viewing of natural scenes are directed by both bottom-up visual saliency and top-down visual factors.