Real-world data is extremely imbalanced and presents a long-tailed distribution, resulting in models that are biased towards classes with sufficient samples and perform poorly on rare classes.
We conducted extensive experiments on the dermatology dataset ISIC 2019, and the experimental results show that our approach can effectively leverage knowledge from known categories to discover new semantic categories.
The most common type of lung cancer, lung adenocarcinoma (LUAD), has been increasingly detected since the advent of low-dose computed tomography screening technology.
Deep neural networks are vulnerable to adversarial examples crafted by applying human-imperceptible perturbations on clean inputs.
Then these features are fed into a policy network to intelligently select a subsequence to process.
Ranked #5 on Sign Language Recognition on CSL-Daily
Product image segmentation is vital in e-commerce.
(2) Building electrification played a significant role in decarbonizing space cooling (-87. 7 in China and -130. 2 kilograms of carbon dioxide (kgCO2) per household in India) and appliances (-169. 7 in China and -43. 4 kgCO2 per household in India).
To advance research in this field, we have constructed a poster layout dataset named CGL-Dataset V2.
State-of-the-art shadow removal methods train deep neural networks on collected shadow & shadow-free image pairs, which are desired to complete two distinct tasks via shared weights, i. e., data restoration for shadow regions and identical mapping for non-shadow regions.
Structure-based drug design powered by deep generative models have attracted increasing research interest in recent years.
The challenges faced by text classification with large tag systems in natural language processing tasks include multiple tag systems, uneven data distribution, and high noise.
Therefore, in this paper, we proposed a unified framework to leverage these unseen unlabeled data for open-scenario semi-supervised medical image classification.
To benefit the complementary information between heterogeneous data, we introduce a new Multimodal Transformer (MMFormer) for Remote Sensing (RS) image classification using Hyperspectral Image (HSI) accompanied by another source of data such as Light Detection and Ranging (LiDAR).
Visualizations demonstrate the effects of CorrNet on emphasizing human body trajectories across adjacent frames.
In this work, we find that pretraining shadow removal networks on the image inpainting dataset can reduce the shadow remnants significantly: a naive encoder-decoder network gets competitive restoration quality w. r. t.
Extensive experiments on three shadow removal benchmarks demonstrate that our method outperforms existing shadow removal methods, and our StructNet can be integrated with existing methods to boost their performances further.
In Parallel Continual Learning (PCL), the parallel multiple tasks start and end training unpredictably, thus suffering from training conflict and catastrophic forgetting issues.
We tackle a new problem of multi-view camera and subject registration in the bird's eye view (BEV) without pre-given camera calibration.
Recently, segmentation-based methods are quite popular in scene text detection, which mainly contain two steps: text kernel segmentation and expansion.
To relieve this problem, we propose a self-emphasizing network (SEN) to emphasize informative spatial regions in a self-motivated way, with few extra computations and without additional expensive supervision.
Ranked #6 on Sign Language Recognition on CSL-Daily
In contrast, the inter-task relationships leverage hard and soft labels from data and a constructed expert network.
In this paper, we study a new problem of unsupervised domain adaptive gait recognition (UDA-GR), that learns a gait identifier with supervised labels from the indoor scenes (source domain), and is applied to the outdoor wild scenes (target domain).
In this paper, we focus on the relatively new yet practical problem of clothes-changing video-based person re-identification (CCVReID), which is less studied.
This screening algorithm is customer-oriented and offers personalized commodities by preventing unqualified sellers from participating in the transaction.
In this work, we introduce the image matting into the 3D scenes and use the alpha matte, i. e., a soft mask, to describe lesions in a 3D medical image.
Continual Learning (CL) sequentially learns new tasks like human beings, with the goal to achieve better Stability (S, remembering past tasks) and Plasticity (P, adapting to new tasks).
Skeleton-based human action recognition has been drawing more interest recently due to its low sensitivity to appearance changes and the accessibility of more skeleton data.
It can be caused by many factors, such as the imaging properties, pathological anatomy, and the weak representation of the binary masks, which brings challenges to accurate 3D segmentation.
This letter presents a sensing-communication-computing-control (SC3) integrated satellite unmanned aerial vehicle (UAV) network, where the UAV is equipped with on-board sensors, mobile edge computing (MEC) servers, base stations and satellite communication module.
In such conditions, short-term dependencies are few formally considered, which are critical for classifying similar actions.
Ranked #1 on Skeleton Based Action Recognition on Kinetics-400
The sixth-generation (6G) network will shift its focus to supporting everything including various machine-type devices (MTDs) in an everyone-centric manner.
In this paper, we derive temporal lift pooling (TLP) from the Lifting Scheme in signal processing to intelligently downsample features of different temporal hierarchies.
Existing unsupervised domain adaptation methods based on adversarial learning have achieved good performance in several medical imaging tasks.
Visual object tracking is an important task in computer vision, which has many real-world applications, e. g., video surveillance, visual navigation.
Active camera relocalization (ACR) is a new problem in computer vision that significantly reduces the false alarm caused by image distortions due to camera pose misalignment in fine-grained change detection (FGCD).
As a result, the final method takes the advantage of effective semantic & image-level filling for high-fidelity inpainting.
The key challenges of LML image recognition are the construction of label relationships on Partial Labels of training data and the Catastrophic Forgetting on old classes, resulting in poor generalization.
To obtain a more comprehensive activity understanding for a crowded scene, in this paper, we propose a new problem of panoramic human activity recognition (PAR), which aims to simultaneous achieve the individual action, social group activity, and global activity recognition.
The core of human group detection is the human social relation representation and division. In this paper, we propose a new two-stage multi-head framework for human group detection.
Mobile edge computing (MEC) is considered a novel paradigm for computation-intensive and delay-sensitive tasks in fifth generation (5G) networks and beyond.
First, we propose the uncertainty-aware cascaded predictive filtering (UC-PFilt) that can identify the difficulties of reconstructing clean pixels via predicted kernels and remove the residual rain traces effectively.
In this paper, we develop a new approach that can simultaneously handle three tasks: i) localizing the side-view camera in the top view; ii) estimating the view direction of the side-view camera; iii) detecting and associating the same subjects on the ground across the complementary views.
Our results show that the Markov dependence impacts on the generalization error in the way that sample size would be discounted by a multiplicative factor depending on the spectral gap of underlying Markov chain.
The observation of this work motivates us to design a novel detection-aware shadow removal framework, which empowers shadow removal to achieve higher restoration quality and enhance the shadow robustness of deployed facial landmark detectors.
Traditionally, the primary goal of LL is to achieve the trade-off between the Stability (remembering past tasks) and Plasticity (adapting to new tasks).
To this end, we first propose the physical modelbased adversarial relighting attack (ARA) denoted as albedoquotient-based adversarial relighting attack (AQ-ARA).
Our DID-Net predicts the three component maps by progressively integrating features across scales, and refines each map by passing an independent refinement network.
Ranked #5 on Image Dehazing on Haze4k
In this work, we explore unsupervised domain adaptation in retinal vessel segmentation by using entropy-based adversarial learning and transfer normalization layer to train a segmentation network, which generalizes well across domains and requires no annotation of the target domain.
3D point cloud completion is very challenging because it heavily relies on the accurate understanding of the complex 3D shapes (e. g., high-curvature, concave/convex, and hollowed-out 3D shapes) and the unknown & diverse patterns of the partially available point clouds.
To overcome the lack of character-level annotations, we propose a novel weakly-supervised character center detection module, which only uses word-level annotated real images to generate character-level labels.
In both cases, Sparta leads to CNNs with higher robustness than the vanilla ReLU, verifying the flexibility and versatility of the proposed method.
We also visualize the correlation matrices, which inspire us to jointly apply different perturbations to improve the success rate of the attack.
Nevertheless, communication and MEC systems are coupled with each other under the influence of complex propagation environment in the MEC-empowered NTN, which makes it hard to orchestrate the resources.
We conduct extensive experiments on the ISTD, ISTD+, and SRD datasets to validate our method's effectiveness and show better performance in shadow regions and comparable performance in non-shadow regions over the state-of-the-art methods.
The Gravity Recovery and Climate Experiment (GRACE) satellite and its successor GRACE Follow-On (GRACE-FO) provide valuable and accurate observations of terrestrial water storage anomalies (TWSAs) at a global scale.
Moreover, comprehensive evaluations have demonstrated two important properties of our method: First, superior transferability across DNNs.
Rehearsal, seeking to remind the model by storing old knowledge in lifelong learning, is one of the most effective ways to mitigate catastrophic forgetting, i. e., biased forgetting of previous knowledge when moving to new tasks.
We propose an Expectation-Maximization (EM) based weakly-supervised learning framework to train an accurate arbitrary-shaped text detector using only a small amount of polygon-level annotated data combined with a large amount of weakly annotated data.
In this paper, we address this problem from the perspective of adversarial attacks and identify a novel task: adversarial co-saliency attack.
To defend the DNNs from the negative rain effect, we also present a defensive deraining strategy, for which we design an adversarial rain augmentation that uses mixed adversarial rain layers to enhance deraining models for downstream DNN perception.
To fill this gap, in this paper, we regard the single-image deraining as a general image-enhancing problem and originally propose a model-free deraining method, i. e., EfficientDeRain, which is able to process a rainy image within 10~ms (i. e., around 6~ms on average), over 80 times faster than the state-of-the-art method (i. e., RCDNet), while achieving similar de-rain effects.
Only a few research efforts have been devoted to other random delay characteristics, such as the delay bound violation probability and the probability distribution of the delay, by decoupling the transmission and computation processes of MEC.
In fifth generation (5G) and beyond Internet of Things (IoT), it becomes increasingly important to serve a massive number of IoT devices outside the coverage of terrestrial cellular networks.
To this end, we initiate the very first attempt to study this problem from the perspective of adversarial attack and propose the adversarial denoise attack.
As the GAN-based face image and video generation techniques, widely known as DeepFakes, have become more and more matured and realistic, there comes a pressing and urgent demand for effective DeepFakes detectors.
In this paper, we study the integration of UAVs with existing MCNs, and investigate the potential gains of hybrid satellite-UAV-terrestrial networks for maritime coverage.
Paired egocentric interaction recognition (PEIR) is the task to collaboratively recognize the interactions between two persons with the videos in their corresponding views.
In this paper, we argue that for REC the referring expression and the target region are semantically correlated and subject, location and relationship consistency exist between vision and language. On top of this, we propose a novel approach called MutAtt to construct mutual guidance between vision and language, which treat vision and language equally thus yield compact information matching.
Besides, we also theoretically prove the invariance of our ALR approach to the ambiguity of normal and lighting decomposition.
By using the public data from Jan. 20 to Feb. 11, 2020, we perform data-driven analysis and forecasting on the COVID-19 epidemic in mainland China, especially Hubei province.
Besides, the attack is further enhanced by adaptively tuning the translations of object and background.
We identify that online object tracking poses two new challenges: 1) it is difficult to generate imperceptible perturbations that can transfer across frames, and 2) real-time trackers require the attack to satisfy a certain level of efficiency.
Motivated from the name of TextSnake, which is only a detection model, we call the proposed text spotting framework TextDragon.
Ranked #11 on Text Spotting on SCUT-CTW1500
We propose Dynamically Pruned Message Passing Networks (DPMPN) for large-scale knowledge graph reasoning.
In this paper, we address these two problems by constructing a Blurred Video Tracking benchmark, which contains a variety of videos with different levels of motion blurs, as well as ground truth tracking results for evaluating trackers.
Classical CNN based object detection methods only extract the objects' image features, but do not consider the high-level relationship among objects in context.
It is thus necessary to complete the sparse LiDAR data, where a synchronized guidance RGB image is often used to facilitate this completion.
However, for such collaborative analysis, the first step is to associate people, referred to as subjects in this paper, across these two views.
Instead, inspired by the consciousness prior proposed by Yoshua Bengio, we explore reasoning with the notion of attentive awareness from a cognitive perspective, and formulate it in the form of attentive message passing on graphs, called neural consciousness flow (NeuCFlow).
Human-object interactions (HOI) recognition and pose estimation are two closely related tasks.
Real-world scenarios demand reasoning about process, more than final outcome prediction, to discover latent causal chains and better understand complex systems.
Then, the SCG can be trained based on these surrogate costs using standard backpropagation.
In particular, we design a Mask Weight Network (MWN) to learn a set of masks and then apply channel-wise masking operations to ROI feature map, followed by a global pooling and a cheap fully-connected layer.
How to effectively learn temporal variation of target appearance, to exclude the interference of cluttered background, while maintaining real-time response, is an essential problem of visual object tracking.
Ranked #5 on Visual Object Tracking on OTB-2013
Based on inexpensive platform with unreliable absolute repositioning accuracy (ARA), we propose a hand-eye calibration free strategy to actively relocate camera into the same 6D pose that produces the input reference image, by sequentially correcting 3D relative rotation and translation.
To guarantee detection sensitivity and accuracy of minute changes, in an observation, we capture a group of images under multiple illuminations, which need only to be roughly aligned to the last time lighting conditions.
From our study, we make some reasonable recommendations of combining existing methods that perform the best in different situations for this challenging problem.
For this purpose, we aim at constructing maximum cohesive SP-grid, which is composed of real nodes, i. e. SPs, and dummy nodes that are meaningless in the image with only position-taking function in the grid.