To tackle this issue, we make an early effort to study temporal action localization from the perspective of multi-modality feature learning, based on the observation that different actions exhibit specific preferences to appearance or motion modality.
Then, a BG Eliminating Module and a DO Eliminating Module are proposed to successively filter out the BG and DO information from the query feature, based on which we can obtain a BG and DO-free target object segmentation result.
Then to remove the bias in GNN estimation, we propose a novel Debiased Graph Neural Networks (DGNN) with a differentiated decorrelation regularizer.
Furthermore, we maintain the performance of estimated views and the final view and reduce the mutual information of every two views.
On the other hand, instead of processing the twokinds of data separately, we build a novel dual graph modelto guide the focal stack fusion process using all-focus pat-terns.
In this paper, we propose a novel consensus-aware dynamic convolution model to explicitly and effectively perform the "summarize and search" process.
Ranked #2 on Co-Salient Object Detection on CoSal2015
Conventional salient object detection models cannot differentiate the importance of different salient objects.
Camouflaged object detection (COD) is a challenging task due to the low boundary contrast between the object and its surroundings.
Then the cross-view contrastive learning, as well as a view mask mechanism, is proposed, which is able to extract the positive and negative embeddings from two views.
We also develop a token-based multi-task decoder to simultaneously perform saliency and boundary detection by introducing task-related tokens and a novel patch-task-attention mechanism.
Ranked #1 on RGB-D Salient Object Detection on NJUD
We also find that the performance of some hyperbolic GCNs can be improved by simply replacing the graph operations with those we defined in this paper.
Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain.
In this paper, we model the information fusion within focal stack via graph networks.
Early fusion and the result fusion schemes fuse RGB and depth information at the input and output stages, respectively, hence incur the problem of distribution gap or information loss.
Considering the reliability of the other modality's attention, we further propose a selection attention to weight the newly added attention term.
Ranked #19 on RGB-D Salient Object Detection on NJU2K
In this paper, we propose an online Multi-Object Tracking (MOT) approach which integrates the merits of single object tracking and data association methods in a unified framework to handle noisy detections and frequent interactions between targets.
Ranked #5 on Online Multi-Object Tracking on MOT16
We propose three specific formulations of the PiCANet via embedding the pixel-wise contextual attention mechanism into the pooling and convolution operations with attending to global or local contexts.
We formulate the proposed PiCANet in both global and local forms to attend to global and local contexts, respectively.
Ranked #7 on RGB Salient Object Detection on SOC
Furthermore, the proposed DSCLSTM model can significantly boost the saliency detection performance by incorporating both global spatial interconnections and scene context modulation, which may uncover novel inspirations for studies on them in computational saliency models.
Then a novel hierarchical recurrent convolutional neural network (HRCNN) is adopted to further hierarchically and progressively refine the details of saliency maps step by step via integrating local context information.
Ranked #17 on RGB Salient Object Detection on DUTS-TE
It is believed that eye movements in free-viewing of natural scenes are directed by both bottom-up visual saliency and top-down visual factors.