Real-world data is extremely imbalanced and presents a long-tailed distribution, resulting in models that are biased towards classes with sufficient samples and perform poorly on rare classes.
To be precise, we use the Dual-Guided Attention (DGA) module we proposed to replace some multi-scale transformations with the calculation of attention which means we only use several attention layers of near linear complexity to achieve performance comparable to frequently-used multi-layer fusion.
Deep neural networks are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on clean inputs.
By combining the DPRNN module with Convolution Recurrent Network (CRN), the DPCRN obtained a promising performance in speech separation with a limited model size.
Masked image modeling (MIM) with transformer backbones has recently been exploited as a powerful self-supervised pre-training technique.
Then, we develop a progressive aggregation module to enhance the spatio and temporal characteristics of features maps, and effectively integrate the three kinds of features.
To enable MedRPG to locate nuanced medical findings with better region-phrase correspondences, we further propose Tri-attention Context contrastive alignment (TaCo).
In Parallel Continual Learning (PCL), the parallel multiple tasks start and end training unpredictably, thus suffering from training conflict and catastrophic forgetting issues.
Continual Learning (CL) sequentially learns new tasks like human beings, with the goal to achieve better Stability (S, remembering past tasks) and Plasticity (P, adapting to new tasks).
Preoperative and noninvasive prediction of the meningioma grade is important in clinical practice, as it directly influences the clinical decision making.
Multi-modal MR imaging is routinely used in clinical practice to diagnose and investigate brain tumors by providing rich complementary information.
Traditionally, the primary goal of LL is to achieve the trade-off between the Stability (remembering past tasks) and Plasticity (adapting to new tasks).
Our DID-Net predicts the three component maps by progressively integrating features across scales, and refines each map by passing an independent refinement network.
Ranked #5 on Image Dehazing on Haze4k
The bottleneck is the lack of a well-established dataset with high-quality annotations for video shadow detection.
Besides, we also theoretically prove the invariance of our ALR approach to the ambiguity of normal and lighting decomposition.
In this paper, we address these two problems by constructing a Blurred Video Tracking benchmark, which contains a variety of videos with different levels of motion blurs, as well as ground truth tracking results for evaluating trackers.
However, for such collaborative analysis, the first step is to associate people, referred to as subjects in this paper, across these two views.
Knowledge Distillation (KD) aims at improving the performance of a low-capacity student model by inheriting knowledge from a high-capacity teacher model.
How to effectively learn temporal variation of target appearance, to exclude the interference of cluttered background, while maintaining real-time response, is an essential problem of visual object tracking.
Ranked #5 on Visual Object Tracking on OTB-2013
To guarantee detection sensitivity and accuracy of minute changes, in an observation, we capture a group of images under multiple illuminations, which need only to be roughly aligned to the last time lighting conditions.
For this purpose, we aim at constructing maximum cohesive SP-grid, which is composed of real nodes, i. e. SPs, and dummy nodes that are meaningless in the image with only position-taking function in the grid.