Prior plays an important role in providing the plausible constraint on human motion.
Pixel-level 2D object semantic understanding is an important topic in computer vision and could help machine deeply understand objects (e. g. functionality and affordance) in our daily life.
Motivated by the above facts, we propose a novel and fully end-to-end trainable approach, called regional contrastive consistency regularization (RCCR) for domain adaptive semantic segmentation.
To address this issue, we term this task as a Spatial-Temporal Inconsistency Learning (STIL) process and instantiate it into a novel STIL block, which consists of a Spatial Inconsistency Module (SIM), a Temporal Inconsistency Module (TIM), and an Information Supplement Module (ISM).
Extensive experiments demonstrate that our method can soundly boost the performance on both cross-domain object detection and segmentation for state-of-the-art techniques.
Inspired by this, we propose a novel semi-supervised framework based on pseudo-labeling for outdoor 3D object detection tasks.
In this paper, we study a practical setting called Specific Domain Adaptation (SDA) that aligns the source and target domains in a demanded-specific dimension.
The generated contextual mask is critical in this work and will guide the domain mixup.
However, little attention has been paid to the feature extraction process for the FAS task, especially the influence of normalization, which also has a great impact on the generalization of the learned representation.
Face anti-spoofing approaches based on domain generalization (DG) have drawn growing attention due to their robustness for unseen scenarios.
Moreover, our model is more stable for training in a non-adversarial manner, compared to other adversarial based novelty detection methods.
Recently, DETR and Deformable DETR have been proposed to eliminate the need for many hand-designed components in object detection while demonstrating good performance as previous complex hand-crafted detectors.
Our method can significantly improve the backbones in all three datasets.
Ranked #2 on Semantic Segmentation on Semantic3D
In this paper, we propose a novel contrastive regularization (CR) built upon contrastive learning to exploit both the information of hazy images and clear images as negative and positive samples, respectively.
The Information Bottleneck (IB) provides an information theoretic principle for representation learning, by retaining all information relevant for predicting label while minimizing the redundancy.
Abnormal event detection is a challenging task that requires effectively handling intricate features of appearance and motion.
Spherical Voxel Convolution and Point Re-sampling are proposed to extract rotation invariant features for each point.
In this paper, we present a novel purified memory mechanism that simulates the recognition process of human beings.
Boundary information plays a significant role in 2D image segmentation, while usually being ignored in 3D point cloud segmentation where ambiguous features might be generated in feature extraction, leading to misclassification in the transition area between two objects.
This allows the supervision to be aligned with the property of saliency detection, where the salient objects of an image could be from more than one class.
In the work, we disentangle the direct offset into Local Canonical Coordinates (LCC), box scales and box orientations.
Face anti-spoofing is crucial to security of face recognition systems.
The brain structure in the collected data is complicated, thence, doctors are required to spend plentiful energy when diagnosing brain abnormalities.
To capture the underlying structure of live faces data in latent representation space, we propose to train the live face data only, with a convolutional Encoder-Decoder network acting as a Generator.
no code implementations • 7 May 2020 • Codruta O. Ancuti, Cosmin Ancuti, Florin-Alexandru Vasluianu, Radu Timofte, Jing Liu, Haiyan Wu, Yuan Xie, Yanyun Qu, Lizhuang Ma, Ziling Huang, Qili Deng, Ju-Chin Chao, Tsung-Shan Yang, Peng-Wen Chen, Po-Min Hsu, Tzu-Yi Liao, Chung-En Sun, Pei-Yuan Wu, Jeonghyeok Do, Jongmin Park, Munchurl Kim, Kareem Metwaly, Xuelu Li, Tiantong Guo, Vishal Monga, Mingzhao Yu, Venkateswararao Cherukuri, Shiue-Yuan Chuang, Tsung-Nan Lin, David Lee, Jerome Chang, Zhan-Han Wang, Yu-Bang Chang, Chang-Hong Lin, Yu Dong, Hong-Yu Zhou, Xiangzhen Kong, Sourya Dipta Das, Saikat Dutta, Xuan Zhao, Bing Ouyang, Dennis Estrada, Meiqi Wang, Tianqi Su, Siyi Chen, Bangyong Sun, Vincent Whannou de Dravo, Zhe Yu, Pratik Narang, Aryan Mehra, Navaneeth Raghunath, Murari Mandal
We focus on the proposed solutions and their results evaluated on NH-Haze, a novel dataset consisting of 55 pairs of real haze free and nonhomogeneous hazy images recorded outdoor.
Most existing expression manipulation methods resort to discrete expression labels, which mainly edit global expressions and ignore the manipulation of fine details.
Visual semantic correspondence is an important topic in computer vision and could help machine understand objects in our daily life.
Guided by this mask, we propose a ClassOut strategy to realize effective regional consistency in a fine-grained manner.
Instead, leveraging inter-model disagreement between different models is a key to locate pseudo label errors.
In this paper, we propose a novel version of Gated Recurrent Unit (GRU), called Single Tunnelled GRU for abnormality detection.
In this paper, we introduce body part segmentation as critical supervision.
Moreover, to extract precise local features, we propose an adaptive attention learning module to refine the attention map of each AU adaptively.
Although huge progress has been made on scene analysis in recent years, most existing works assume the input images to be in day-time with good lighting conditions.
Detecting 3D objects keypoints is of great interest to the areas of both graphics and computer vision.
Anomaly detection is a fundamental problem in computer vision area with many real-world applications.
Acoustic anomaly detection aims at distinguishing abnormal acoustic signals from the normal ones.
One-class novelty detection is the process of determining if a query example differs from the training examples (the target class).
Besides local features, global information plays an essential role in semantic segmentation, while recent works usually fail to explicitly extract the meaningful global information and make full use of it.
Specifically, we introduce a spatio-temporal graph convolutional network to capture both spatial and temporal relations from dynamic AUs, in which the AU relations are formulated as a spatio-temporal graph with adaptively learned instead of predefined edge weights.
Semantic understanding of 3D objects is crucial in many applications such as object manipulation.
Instead of using an intermediate estimated guidance, we propose to explicitly transfer facial expression by directly mapping two unpaired input images to two synthesized images with swapped expressions.
We then introduce a proposal generation network to predict 3D region proposals from the generated maps and further extrude objects of interest from the whole point cloud.
Due to the combination of source AU-related information and target AU-free information, the latent feature domain with transferred source label can be learned by maximizing the target-domain AU detection performance.
As a way to significantly reduce model size and computation time, binarized neural network has only been shown to excel on semantic-level tasks such as image classification and recognition.
Point cloud analysis without pose priors is very challenging in real applications, as the orientations of point clouds are often unknown.
By finding the region of interest of each AU with the attention mechanism, AU-related local features can be captured.
Most of the existing deep learning methods only use one fully-connected layer called shape prediction layer to estimate the locations of facial landmarks.
Ranked #2 on Face Alignment on AFLW2000
In this paper, we propose a two-stage depth ranking based method (DRPose3D) to tackle the problem of 3D human pose estimation.
The task of face attribute manipulation has found increasing applications, but still remains challeng- ing with the requirement of editing the attributes of a face image while preserving its unique details.
Facial action unit (AU) detection and face alignment are two highly correlated tasks since facial landmarks can provide precise AU locations to facilitate the extraction of meaningful local features for AU detection.
Traditional approaches to interpolate/extrapolate frames in a video sequence require accurate pixel correspondences between images, e. g., using optical flow.
In this paper, we propose a novel face alignment method that trains deep convolutional network from coarse to fine.