Salient object detection (SOD) and camouflaged object detection (COD) are related yet distinct binary mapping tasks.
Applying NeRF to downstream perception tasks for scene understanding and representation is becoming increasingly popular.
Our previous work, the Visual Saliency Transformer (VST), addressed this constraint from a transformer-based sequence-to-sequence perspective, to unify RGB and RGB-D SOD.
no code implementations • 8 Oct 2023 • Tianyang Zhong, Wei Zhao, Yutong Zhang, Yi Pan, Peixin Dong, Zuowei Jiang, Xiaoyan Kui, Youlan Shang, Li Yang, Yaonai Wei, Longtao Yang, Hao Chen, Huan Zhao, Yuxiao Liu, Ning Zhu, Yiwei Li, Yisong Wang, Jiaqi Yao, Jiaqi Wang, Ying Zeng, Lei He, Chao Zheng, Zhixue Zhang, Ming Li, Zhengliang Liu, Haixing Dai, Zihao Wu, Lu Zhang, Shu Zhang, Xiaoyan Cai, Xintao Hu, Shijie Zhao, Xi Jiang, Xin Zhang, Xiang Li, Dajiang Zhu, Lei Guo, Dinggang Shen, Junwei Han, Tianming Liu, Jun Liu, Tuo Zhang
Radiology report generation, as a key step in medical image analysis, is critical to the quantitative analysis of clinically informed decision-making levels.
There has been a recent emphasis on integrating physical models and deep neural networks (DNNs) for SAR target recognition, to improve performance and achieve a higher level of physical interpretability.
We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively.
In this study, we propose a method called Chat2Brain that combines LLMs to basic text-2-image model, known as Text2Brain, to map open-ended semantic queries to brain activation maps in data-scarce and complex query environments.
The past few years have witnessed the immense success of object detection, while current excellent detectors struggle on tackling size-limited instances.
Ranked #1 on Small Object Detection on SODA-D
Then, we use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation and co-saliency token-to-token correlation modules.
no code implementations • 21 Apr 2023 • Tianyang Zhong, Yaonai Wei, Li Yang, Zihao Wu, Zhengliang Liu, Xiaozheng Wei, Wenjun Li, Junjie Yao, Chong Ma, Xiang Li, Dajiang Zhu, Xi Jiang, Junwei Han, Dinggang Shen, Tianming Liu, Tuo Zhang
The proposed method uses the strengths of LLMs' understanding and logical reasoning to correct the incomplete logical facts for optimizing the performance of perceptual module, by summarizing and reorganizing reasoning rules represented in natural language format.
Advanced Patch Attacks (PAs) on object detection in natural images have pointed out the great safety vulnerability in methods based on deep neural networks.
We set up novel evaluation benchmarks based on a series of testing sets with evolving distributions.
Ranked #62 on Long-tail Learning on CIFAR-100-LT (ρ=100)
Inspired by the recent success of the Prompting technique, we introduce a new pre-training method that boosts QEIS models by giving Saliency Prompt for queries/kernels.
Current mainstream object detection methods for large aerial images usually divide large images into patches and then exhaustively detect the objects of interest on all patches, no matter whether there exist objects or not.
Few-shot semantic segmentation task aims at performing segmentation in query images with a few annotated support samples.
To solve this problem, we are the first to introduce an intermediate prototype for mining both deterministic category information from the support and adaptive category knowledge from the query.
Ranked #29 on Few-Shot Semantic Segmentation on PASCAL-5i (1-Shot)
Then, to catalyze the development of SOD, we construct two large-scale Small Object Detection dAtasets (SODA), SODA-D and SODA-A, which focus on the Driving and Aerial scenarios respectively.
The convolution on mesh considers the spatial organization of functional gradients and folding patterns on a cortical sheet and the newly designed channel attention block enhances the interpretability of the contribution of different functional gradients to cortical folding prediction.
To tackle this issue, we make an early effort to study temporal action localization from the perspective of multi-modality feature learning, based on the observation that different actions exhibit specific preferences to appearance or motion modality.
Then, a BG Eliminating Module and a DO Eliminating Module are proposed to successively filter out the BG and DO information from the query feature, based on which we can obtain a BG and DO-free target object segmentation result.
Few-shot segmentation, which aims to segment unseen-class objects given only a handful of densely labeled samples, has received widespread attention from the community.
Ranked #47 on Few-Shot Semantic Segmentation on COCO-20i (1-shot)
Improving the resolution of magnetic resonance (MR) image data is critical to computer-aided diagnosis and brain function analysis.
Semantic segmentation with limited annotations, such as weakly supervised semantic segmentation (WSSS) and semi-supervised semantic segmentation (SSSS), is a challenging task that has attracted much attention recently.
Ranked #26 on Weakly-Supervised Semantic Segmentation on COCO 2014 val
Specifically, we apply an additional branch (base learner) to the conventional FSS model (meta learner) to explicitly identify the targets of base classes, i. e., the regions that do not need to be segmented.
Ranked #19 on Few-Shot Semantic Segmentation on PASCAL-5i (1-Shot)
Based on the exemplar-consultation mechanism, the long-term dependencies can be captured by regarding historical frames as exemplars, while the category-level modeling can be achieved by regarding representative frames from a category as exemplars.
Ranked #6 on Online Action Detection on TVSeries
Recent advances in machine learning and prevalence of digital medical images have opened up an opportunity to address the challenging brain tumor segmentation (BTS) task by using deep convolutional neural networks.
Object rotation is among long-standing, yet still unexplored, hard issues encountered in the task of weakly supervised object detection (WSOD) from aerial images.
Finally, in accordance with the in-depth observations for the methods based on proxy data, we argue that leveraging the proxy data is still an effective way for surrogate training.
Zero-shot object detection aims at incorporating class semantic vectors to realize the detection of (both seen and) unseen classes given an unconstrained test image.
Ranked #2 on Zero-Shot Object Detection on PASCAL VOC'07
Under this circumstance, the models learned from different views can distill valuable knowledge to guide the learning processes of each other.
Current weakly supervised semantic segmentation (WSSS) frameworks usually contain the separated mask-refinement model and the main semantic region mining model.
To this end, we make a pioneering effort to distill helpful knowledge from a heavy network model learned from high-resolution (HR) images to a compact network model that will handle LR images, thus advancing the current knowledge distillation technique with the novel pixel distillation.
Weakly supervised temporal action localization aims at learning the instance-level action pattern from the video-level labels, where a significant challenge is action-context confusion.
In this paper, we first propose a novel physically explainable convolutional neural network for SAR image classification, namely physics guided and injected learning (PGIL).
Nowadays, oriented detectors mostly use horizontal boxes as intermedium to derive oriented boxes from them.
On the other hand, instead of processing the twokinds of data separately, we build a novel dual graph modelto guide the focal stack fusion process using all-focus pat-terns.
In this paper, we propose a novel consensus-aware dynamic convolution model to explicitly and effectively perform the "summarize and search" process.
Ranked #2 on Co-Salient Object Detection on CoSal2015
Current state-of-the-art two-stage detectors generate oriented proposals through time-consuming schemes.
Ranked #8 on Object Detection In Aerial Images on DOTA (using extra training data)
Conventional salient object detection models cannot differentiate the importance of different salient objects.
Though deep learning models can extract the nonlinear relationship, they could not select relevant genetic factors.
Weakly supervised object localization (WSOL) aims at learning to localize objects of interest by only using the image-level labels as the supervision.
In this work, we propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to help the research progress.
Ranked #1 on Unsupervised Semantic Segmentation on ImageNet-S-50
We also develop a token-based multi-task decoder to simultaneously perform saliency and boundary detection by introducing task-related tokens and a novel patch-task-attention mechanism.
Ranked #1 on RGB-D Salient Object Detection on NJUD
As an emerging and challenging problem in the computer vision community, weakly supervised object localization and detection plays an important role for developing new generation computer vision systems and has received significant attention in the past decade.
Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain.
To this end, this paper revisits the role of top-down modeling in salient object detection and designs a novel densely nested top-down flows (DNTDF)-based framework.
Specifically, we design a variational model to formulate the image de-blocking problem and propose two prior terms for the image content and gradient, respectively.
In this paper, we model the information fusion within focal stack via graph networks.
Early fusion and the result fusion schemes fuse RGB and depth information at the input and output stages, respectively, hence incur the problem of distribution gap or information loss.
To address this problem, this paper proposes a novel anchor-free action localization module that assists action localization by temporal points.
The existing methods can be categorized into two localization-by-classification pipelines, i. e., the pre-classification pipeline and the post-classification pipeline.
That is, we seamlessly embed various intra-view information, cross-view multi-dimension bilinear interactive information, and a new view ensemble mechanism into a unified framework to make a decision via the optimization.
In particular, first, we propose to regroup the multi-level features into teacher and student features using a bifurcated backbone strategy (BBS).
Ranked #2 on RGB-D Salient Object Detection on RGBD135
Considering the rapid evolution of this field, this paper provides a systematic survey of deep learning methods for remote sensing image scene classification by covering more than 160 papers.
However, the current survey of datasets and deep learning based methods for object detection in optical remote sensing images is not adequate.
Clustering is an effective technique in data mining to group a set of objects in terms of some attributes.
We propose three specific formulations of the PiCANet via embedding the pixel-wise contextual attention mechanism into the pooling and convolution operations with attending to global or local contexts.
In this paper, we formulate this problem as a Markov Decision Process, where agents are learned to segment object regions under a deep reinforcement learning framework.
This "noisy" motion representation makes it very challenging for pose estimation and action recognition in real scenarios.
Based on this insight, we combine an intra-image fusion stream and a inter-image fusion stream in the proposed framework to generate the learning curriculum and pseudo ground-truth for supervising the training of the deep salient object detector.
We formulate the proposed PiCANet in both global and local forms to attend to global and local contexts, respectively.
Ranked #7 on RGB Salient Object Detection on SOC
Recently, researchers have made great processes to build category-specific 3D shape models from 2D images with manual annotations consisting of class labels, keypoints, and ground truth figure-ground segmentations.
Object segmentation in weakly labelled videos is an interesting yet challenging task, which aims at learning to perform category-specific video object segmentation by only using video-level tags.
Weakly-supervised object detection (WOD) is a challenging problems in computer vision.
Ranked #34 on Weakly Supervised Object Detection on PASCAL VOC 2007
During the past years, significant efforts have been made to develop various datasets or present a variety of approaches for scene classification from remote sensing images.
Furthermore, the proposed DSCLSTM model can significantly boost the saliency detection performance by incorporating both global spatial interconnections and scene context modulation, which may uncover novel inspirations for studies on them in computational saliency models.
This is achieved by introducing and learning a rotation-invariant layer and a Fisher discriminative layer, respectively, on the basis of the existing high-capacity CNN architectures.
In real world applications, more and more data, for example, image/video data, are high dimensional and represented by multiple views which describe different perspectives of the data.
Aiming at automatically discovering the common objects contained in a set of relevant images and segmenting them as foreground simultaneously, object co-segmentation has become an active research topic in recent years.
Then a novel hierarchical recurrent convolutional neural network (HRCNN) is adopted to further hierarchically and progressively refine the details of saliency maps step by step via integrating local context information.
Ranked #14 on RGB Salient Object Detection on DUTS-TE (F-measure metric)
Co-saliency detection is a newly emerging and rapidly growing research area in computer vision community.
As an interesting and emerging topic, co-saliency detection aims at simultaneously extracting common salient objects in a group of images.
Part model-based methods have been successfully applied to object detection and scene classification and have achieved state-of-the-art results.
It is believed that eye movements in free-viewing of natural scenes are directed by both bottom-up visual saliency and top-down visual factors.
In the proposed framework, the wide and deep information are explored for the object proposal windows extracted in each image, and the co-saliency scores are calculated by integrating the intra-image contrast and intra group consistency via a principled Bayesian formulation.