One-shot face re-enactment is a challenging task due to the identity mismatch between source and driving faces.
We propose a front-to-top view projection (FTVP) module, which takes the constraint of cycle consistency between views into account and makes full use of their correlation to strengthen the view transformation and scene understanding.
Little benefit was observed by adding frames more than one second away from the predicted transformation, with or without LSTM-based RNNs.
Facial Attribute Manipulation (FAM) aims to aesthetically modify a given face image to render desired attributes, which has received significant attention due to its broad practical applications ranging from digital entertainment to biometric forensics.
To solve this problem, we propose a Supervised Contrastive Learning (SCL) method with Tree-structured Parzen Estimator (TPE) technique for imbalanced tabular datasets.
Deep learning (DL) methods have been widely applied to anomaly-based network intrusion detection system (NIDS) to detect malicious traffic.
We argue that when learning high-order information from temporal graphs, we encounter two challenges, i. e., computational inefficiency and over-smoothing, that cannot be solved by conventional techniques applied on static graphs.
How to mitigate the sampling bias for heterogeneous GCL is another important problem.
Understanding the origin and influence of the publication's idea is critical to conducting scientific research.
Existing studies on neural architecture search (NAS) mainly focus on efficiently and effectively searching for network architectures with better performance.
In review-based recommendation methods, review data is considered as auxiliary information that can improve the quality of learned user/item or interaction representations for the user rating prediction task.
To reduce human annotations for relation extraction (RE) tasks, distantly supervised approaches have been proposed, while struggling with low performance.
Industrial recommender systems usually hold data from multiple business scenarios and are expected to provide recommendation services for these scenarios simultaneously.
Here, we evaluate whether analyzing a weighted dynamic aging-specific subnetwork inferred from newer GE and PPIN data improves prediction accuracy upon analyzing the best current subnetwork inferred from outdated data.
Therefore, only acoustic sensors (non-intrusive) need to be installed during the application phase, which is convenient and crucial for the condition monitoring of safety-critical infrastructure.
We report on the realization of a long-haul radio frequency (RF) transfer scheme by using multiple-access relay stations (MARSs).
AnyFace can achieve high-quality, high-resolution, and high-diversity face synthesis and manipulation results without any constraints on the number and content of input captions.
In addition, semantic information is introduced into the semantic-guided fusion module to control the swapped area and model the pose and expression more accurately.
Deepfake detection automatically recognizes the manipulated medias through the analysis of the difference between manipulated and non-altered videos.
New COVID-19 epidemic strains like Delta and Omicron with increased transmissibility and pathogenicity emerge and spread across the whole world rapidly while causing high mortality during the pandemic period.
In this paper, we study the named entity recognition (NER) problem under distant supervision.
In this paper, we propose a novel method to extend ANN search to arbitrary matching functions, e. g., a deep neural network.
Furthermore, we design a novel Memory Refinement Loss (MR Loss) for feature alignment in the memory module, which enhances the accuracy of memory slots in an unsupervised manner.
This paper adopts the truth discovery idea to aggregate constituency parse trees from different parsers by estimating their reliability in the absence of ground truth.
In addition, to improve the fidelity of the generated results, we leverage the semantic layouts to construct two types of Representational Graphs which indicate the intra-class semantic features and inter-class structural features of the synthesized images.
While most works focus on single-object or agent-object visual functionality and affordances, our work proposes to study a new kind of visual relationship that is also important to perceive and model -- inter-object functional relationships (e. g., a switch on the wall turns on or off the light, a remote control operates the TV).
For commercial cloud speech APIs, we propose Occam, a decision-only black-box adversarial attack, where only final decisions are available to the adversary.
Biphasic facial age translation aims at predicting the appearance of the input face at any age.
The proposed Aggregation method for Sequential Labels from Crowds ($AggSLC$) jointly considers the characteristics of sequential labeling tasks, workers' reliabilities, and advanced machine learning techniques.
In particular, we introduce a novel locality-aware context fusion based segmentation model to process local patches, where the relevance between local patch and its various contexts are jointly and complementarily utilized to handle the semantic regions with large variations.
In this work, we carefully revisit the standard training practice of detectors, and find that the detection performance is often limited by the imbalance during the training process, which generally consists in three levels - sample level, feature level, and objective level.
We also evaluate the effectiveness of our attack under two defenses: one is well-designed adversarial graph detector and the other is that the target GNN model itself is equipped with a defense to prevent adversarial graph generation.
To address this problem, we introduce QA-driven slot filling (QASF), which extracts slot-filler spans from utterances with a span-based QA model.
By introducing the method and metrics, we invite the community to study this novel map learning problem.
It aims to maximize the mutual dependencies between item content and collaborative signals.
For videos, such negative transfer could be triggered by both spatial and temporal features, which leads to a more challenging Partial Video Domain Adaptation (PVDA) problem.
To tackle these challenges, we propose a novel Semantic-Driven Generative Adversarial Network (SDGAN) which embeds global structure-level style injection and local class-level knowledge re-weighting.
To this end, we propose Whisper, a realtime ML based malicious traffic detection system that achieves both high accuracy and high throughput by utilizing frequency domain features.
Then, the color and location probability map of the moving area will be calculated through maximum a posteriori probability.
Furthermore, our model runs at 35 FPS on a single GPU, which is efficient and applicable for real-time panorama HD map reconstruction.
In deep learning, a typical strategy for transfer learning is to freeze the early layers of a pre-trained model and fine-tune the rest of its layers on the target domain.
1 code implementation • 2 Jun 2021 • Bo Peng, Hongxing Fan, Wei Wang, Jing Dong, Yuezun Li, Siwei Lyu, Qi Li, Zhenan Sun, Han Chen, Baoying Chen, Yanjie Hu, Shenghai Luo, Junrui Huang, Yutong Yao, Boyuan Liu, Hefei Ling, Guosheng Zhang, Zhiliang Xu, Changtao Miao, Changlei Lu, Shan He, Xiaoyan Wu, Wanyi Zhuang
This competition provides a common platform for benchmarking the adversarial game between current state-of-the-art DeepFake creation and detection methods.
Extensive experiments demonstrate the superiority of MegaFS and the first megapixel level face swapping database is released for research on DeepFake detection and face image editing in the public domain.
Ranked #7 on Face Swapping on FaceForensics++
Single image super-resolution (SISR) aims to reconstruct high-resolution (HR) images from the given low-resolution (LR) ones, which is an ill-posed problem because one LR image corresponds to multiple HR images.
Few-shot learning arises in important practical scenarios, such as when a natural language understanding system needs to learn new semantic labels for an emerging, resource-scarce domain.
There are three main challenges in 3D object grounding: to find the main focus in the complex and diverse description; to understand the point cloud scene; and to locate the target object.
In the past decade, a variety of methods have been developed for subclonal reconstruction using bulk tumor sequencing data.
Extremely elongated, conducting dust particles (also known as metallic "needles" or "whiskers") are seen in carbonaceous chondrites and in samples brought back from the Itokawa asteroid.
Astrophysics of Galaxies
Our empirical results show that the proposed defenses can substantially reduce the estimation errors of the data poisoning attacks.
When receiving a user request, matching system (i) finds the crowds that the user belongs to; (ii) retrieves all ads that have targeted those crowds.
Particularly, the localized adversarial examples only perturb a small and contiguous region of the target object, so that they are robust and effective in both digital and physical worlds.
Specifically, we formulate our attack as an optimization problem, such that the injected ratings would maximize the number of normal users to whom the target items are recommended.
Historical features are important in ads click-through rate (CTR) prediction, because they account for past engagements between users and ads.
Generative Adversarial Networks (GANs) with style-based generators (e. g. StyleGAN) successfully enable semantic control over image synthesis, and recent studies have also revealed that interpretable image translations could be obtained by modifying the latent code.
Our results show that the proposed OptSLA outperforms the state-of-the-art aggregation methods, and the results are easier to interpret.
The experiments on ImageNet verify such path distillation method can improve the convergence ratio and performance of the hypernetwork, as well as boosting the training of subnetworks.
Data-driven quantitative defect reconstructions using ultrasonic guided waves has recently demonstrated great potential in the area of non-destructive testing.
To address this issue, a novel approach to quantitative reconstruction of defects using the integration of data-driven method with the guided wave scattering analysis has been proposed in this paper.
Computational Engineering, Finance, and Science J.2
In this paper, we benchmark the robustness of watermarking, and propose a novel backdoor-based watermark removal framework using limited data, dubbed WILD.
Using our data generation method and the proposed LSFNet, we can recover the details and color of the original scene, and improve the low-light image quality effectively.
EVIDENCEMINER is constructed in a completely automated way without any human effort for training data annotation.
Compared to methods with similar detectors, it boosts almost 10 points of MOTA and significantly decreases the number of ID switches on BDD100K and Waymo datasets.
Ranked #1 on One-Shot Object Detection on PASCAL VOC 2012 val
Instead, we recently inferred a dynamic aging-specific subnetwork using a methodologically more advanced notion of network propagation (NP), which improved upon Induced dynamic aging-specific subnetwork in a different task, that of unsupervised analyses of the aging process.
While typical named entity recognition (NER) models require the training set to be annotated with all target types, each available datasets may only cover a part of them.
The dynamics of human skeletons have significant information for the task of action recognition.
This paper introduces a negative margin loss to metric learning based few-shot learning methods.
An encoder-decoder structure with a context block is introduced to capture multiscale information.
Ranked #7 on Image Denoising on DND (using extra training data)
In this paper, we propose a general framework to derive moment invariants under DAT for objects in M-dimensional space with N channels, which can be called dual-affine moment invariants (DAMI).
Face aging, which aims at aesthetically rendering a given face to predict its future appearance, has received significant research attention in recent years.
This added adversarial perturbation image is called an adversarial example, which poses a serious security problem for systems based on CNN model recognition results.
With the massive number of repositories available, there is a pressing need for topic-based search.
Large, pre-trained generative models have been increasingly popular and useful to both the research and wider communities.
In a systematic and comprehensive evaluation, we find that in many of the evaluation tests: (i) using an aging-specific subnetwork indeed yields more accurate aging-related gene predictions than using the entire network, and (ii) predictive methods from our framework that have not previously been used for supervised prediction of aging-related genes outperform existing prominent methods for the same purpose.
1 code implementation • • Yimin Wang, Qi Li, Li-Juan Liu, Zhi Zhou, Zongcai Ruan, Lingsheng Kong, Yaoyao Li, Yun Wang, Ning Zhong, Renjie Chai, Xiangfeng Luo, Yike Guo, Michael Hawrylycz, Qingming Luo, Zhongze Gu, Wei Xie, Hongkui Zeng, Hanchuan Peng
Neuron morphology is recognized as a key determinant of cell type, yet the quantitative profiling of a mammalian neuron’s complete three-dimensional (3-D) morphology remains arduous when the neuron has complex arborization and long projection.
Then, an end-to-end pipeline is designed to jointly regress the proposed volumetric representation and the coordinate vector.
Ranked #3 on Face Alignment on AFLW2000-3D
Causality extraction from natural language texts is a challenging open problem in artificial intelligence.
Age progression and regression refers to aesthetically render-ing a given face image to present effects of face aging and rejuvenation, respectively.
We continue to see increasingly widespread deployment of IoT devices, with apparent intent to embed them in our built environment likely to accelerate if smart city and related programmes succeed.
Networking and Internet Architecture
Understanding the internal representations of deep neural networks (DNNs) is crucal to explain their behavior.
In this regard, we propose the Adversarial Feature Genome (AFG), a novel type of data that contains both the differences and features about classes.
Using our framework and a self-assembled dataset of 3D objects, we investigate the vulnerability of DNNs to OoD poses of well-known objects in ImageNet.
Understanding the internal representations of deep neural networks (DNNs) is crucal to explain their behavior.
Electricity consumption forecasting has important implications for the mineral companies on guiding quarterly work, normal power system operation, and the management.
Patients who have medical information demands tend to post questions about their health conditions on these crowdsourced Q&A websites and get answers from other users.
Since it is difficult to collect face images of the same subject over a long range of age span, most existing face aging methods resort to unpaired datasets to learn age mappings.
According to the Liouville Theorem, an important part of the conformal transformation is the Mobius transformation, so we focus on Mobius transformation and propose two differential expressions that are invariable under 2-D and 3-D Mobius transformation respectively.
Then, a stacked hourglass network is adopted to estimate the volumetric representation from coarse to fine, followed by a 3D convolution network that takes the estimated volume as input and regresses 3D coordinates of the face shape.
Ranked #1 on 3D Facial Landmark Localization on AFLW2000-3D
To utilize both global and local facial information, we propose a Global and Local Consistent Age Generative Adversarial Network (GLCA-GAN).
Fabric image retrieval is beneficial to many applications including clothing searching, online shopping and cloth modeling.
Then these metrics are input to neural network for supervised learning, the weights of which are output by PSO and BP hybrid algorithm.
This Estimation-Correction-Tuning process perfectly combines the advantages of the global robustness of data-driven method (FCN), outlier correction capability of model-driven method (PDM) and non-parametric optimization of RLMS.
As long as (2) is ensured, the performance of word segmentation does not have appreciable impact on Chinese and Japanese name tagging.