Visual object tracking is an important task in computer vision, which has many real-world applications, e. g., video surveillance, visual navigation.
Active camera relocalization (ACR) is a new problem in computer vision that significantly reduces the false alarm caused by image distortions due to camera pose misalignment in fine-grained change detection (FGCD).
As a result, the final method takes the advantage of effective semantic & image-level filling for high-fidelity inpainting.
The key challenges of LML image recognition are the construction of label relationships on Partial Labels of training data and the Catastrophic Forgetting on old classes, resulting in poor generalization.
To obtain a more comprehensive activity understanding for a crowded scene, in this paper, we propose a new problem of panoramic human activity recognition (PAR), which aims to simultaneous achieve the individual action, social group activity, and global activity recognition.
The core of human group detection is the human social relation representation and division. In this paper, we propose a new two-stage multi-head framework for human group detection.
Mobile edge computing (MEC) is considered a novel paradigm for computation-intensive and delay-sensitive tasks in fifth generation (5G) networks and beyond.
First, we propose the uncertainty-aware cascaded predictive filtering (UC-PFilt) that can identify the difficulties of reconstructing clean pixels via predicted kernels and remove the residual rain traces effectively.
In this paper, we develop a new approach that can simultaneously handle three tasks: i) localizing the side-view camera in the top view; ii) estimating the view direction of the side-view camera; iii) detecting and associating the same subjects on the ground across the complementary views.
Our results show that the Markov dependence impacts on the generalization error in the way that sample size would be discounted by a multiplicative factor depending on the spectral gap of underlying Markov chain.
The observation of this work motivates us to design a novel detection-aware shadow removal framework, which empowers shadow removal to achieve higher restoration quality and enhance the shadow robustness of deployed facial landmark detectors.
Traditionally, the primary goal of LL is to achieve the trade-off between the Stability (remembering past tasks) and Plasticity (adapting to new tasks).
To this end, we first propose the physical model-based adversarial relighting attack (ARA) denoted as albedo-quotient-based adversarial relighting attack (AQ-ARA).
Our DID-Net predicts the three component maps by progressively integrating features across scales, and refines each map by passing an independent refinement network.
Ranked #2 on Image Dehazing on Haze4k
In this work, we explore unsupervised domain adaptation in retinal vessel segmentation by using entropy-based adversarial learning and transfer normalization layer to train a segmentation network, which generalizes well across domains and requires no annotation of the target domain.
3D point cloud completion is very challenging because it heavily relies on the accurate understanding of the complex 3D shapes (e. g., high-curvature, concave/convex, and hollowed-out 3D shapes) and the unknown & diverse patterns of the partially available point clouds.
To overcome the lack of character-level annotations, we propose a novel weakly-supervised character center detection module, which only uses word-level annotated real images to generate character-level labels.
We also visualize the correlation matrices, which inspire us to jointly apply different perturbations to improve the success rate of the attack.
We conduct extensive experiments on the ISTD, ISTD+, and SRD datasets to validate our method's effectiveness and show better performance in shadow regions and comparable performance in non-shadow regions over the state-of-the-art methods.
Nevertheless, communication and MEC systems are coupled with each other under the influence of complex propagation environment in the MEC-empowered NTN, which makes it hard to orchestrate the resources.
The Gravity Recovery and Climate Experiment (GRACE) satellite and its successor GRACE Follow-On (GRACE-FO) provide valuable and accurate observations of terrestrial water storage anomalies (TWSAs) at a global scale.
Moreover, comprehensive evaluations have demonstrated two important properties of our method: First, superior transferability across DNNs.
Rehearsal, seeking to remind the model by storing old knowledge in lifelong learning, is one of the most effective ways to mitigate catastrophic forgetting, i. e., biased forgetting of previous knowledge when moving to new tasks.
We propose an Expectation-Maximization (EM) based weakly-supervised learning framework to train an accurate arbitrary-shaped text detector using only a small amount of polygon-level annotated data combined with a large amount of weakly annotated data.
In this paper, we address this problem from the perspective of adversarial attacks and identify a novel task: adversarial co-saliency attack.
To fill this gap, in this paper, we regard the single-image deraining as a general image-enhancing problem and originally propose a model-free deraining method, i. e., EfficientDeRain, which is able to process a rainy image within 10~ms (i. e., around 6~ms on average), over 80 times faster than the state-of-the-art method (i. e., RCDNet), while achieving similar de-rain effects.
To defend the DNNs from the negative rain effect, we also present a defensive deraining strategy, for which we design an adversarial rain augmentation that uses mixed adversarial rain layers to enhance deraining models for downstream DNN perception.
Only a few research efforts have been devoted to other random delay characteristics, such as the delay bound violation probability and the probability distribution of the delay, by decoupling the transmission and computation processes of MEC.
In fifth generation (5G) and beyond Internet of Things (IoT), it becomes increasingly important to serve a massive number of IoT devices outside the coverage of terrestrial cellular networks.
To this end, we initiate the very first attempt to study this problem from the perspective of adversarial attack and propose the adversarial denoise attack.
As the GAN-based face image and video generation techniques, widely known as DeepFakes, have become more and more matured and realistic, there comes a pressing and urgent demand for effective DeepFakes detectors.
In this paper, we study the integration of UAVs with existing MCNs, and investigate the potential gains of hybrid satellite-UAV-terrestrial networks for maritime coverage.
Paired egocentric interaction recognition (PEIR) is the task to collaboratively recognize the interactions between two persons with the videos in their corresponding views.
In this paper, we argue that for REC the referring expression and the target region are semantically correlated and subject, location and relationship consistency exist between vision and language. On top of this, we propose a novel approach called MutAtt to construct mutual guidance between vision and language, which treat vision and language equally thus yield compact information matching.
Besides, we also theoretically prove the invariance of our ALR approach to the ambiguity of normal and lighting decomposition.
By using the public data from Jan. 20 to Feb. 11, 2020, we perform data-driven analysis and forecasting on the COVID-19 epidemic in mainland China, especially Hubei province.
Besides, the attack is further enhanced by adaptively tuning the translations of object and background.
We identify that online object tracking poses two new challenges: 1) it is difficult to generate imperceptible perturbations that can transfer across frames, and 2) real-time trackers require the attack to satisfy a certain level of efficiency.
Motivated from the name of TextSnake, which is only a detection model, we call the proposed text spotting framework TextDragon.
We propose Dynamically Pruned Message Passing Networks (DPMPN) for large-scale knowledge graph reasoning.
In this paper, we address these two problems by constructing a Blurred Video Tracking benchmark, which contains a variety of videos with different levels of motion blurs, as well as ground truth tracking results for evaluating trackers.
Classical CNN based object detection methods only extract the objects' image features, but do not consider the high-level relationship among objects in context.
It is thus necessary to complete the sparse LiDAR data, where a synchronized guidance RGB image is often used to facilitate this completion.
However, for such collaborative analysis, the first step is to associate people, referred to as subjects in this paper, across these two views.
Instead, inspired by the consciousness prior proposed by Yoshua Bengio, we explore reasoning with the notion of attentive awareness from a cognitive perspective, and formulate it in the form of attentive message passing on graphs, called neural consciousness flow (NeuCFlow).
Human-object interactions (HOI) recognition and pose estimation are two closely related tasks.
Real-world scenarios demand reasoning about process, more than final outcome prediction, to discover latent causal chains and better understand complex systems.
In particular, we design a Mask Weight Network (MWN) to learn a set of masks and then apply channel-wise masking operations to ROI feature map, followed by a global pooling and a cheap fully-connected layer.
How to effectively learn temporal variation of target appearance, to exclude the interference of cluttered background, while maintaining real-time response, is an essential problem of visual object tracking.
Ranked #5 on Visual Object Tracking on OTB-2013
Based on inexpensive platform with unreliable absolute repositioning accuracy (ARA), we propose a hand-eye calibration free strategy to actively relocate camera into the same 6D pose that produces the input reference image, by sequentially correcting 3D relative rotation and translation.
To guarantee detection sensitivity and accuracy of minute changes, in an observation, we capture a group of images under multiple illuminations, which need only to be roughly aligned to the last time lighting conditions.
From our study, we make some reasonable recommendations of combining existing methods that perform the best in different situations for this challenging problem.
For this purpose, we aim at constructing maximum cohesive SP-grid, which is composed of real nodes, i. e. SPs, and dummy nodes that are meaningless in the image with only position-taking function in the grid.