A complete 3D face reconstruction requires to explicitly model the eyeglasses on the face, which is less investigated in the literature.
Although significant progress has been made to audio-driven talking face generation, existing methods either neglect facial emotion or cannot be applied to arbitrary subjects.
The evaluation of human epidermal growth factor receptor 2 (HER2) expression is essential to formulate a precise treatment for breast cancer.
Ranked #1 on Image-to-Image Translation on BCI
In our technique, the motion of visible regions is first estimated and combined with temporal information to infer the motion of the occluded regions through an LSTM-involved graph neural network.
Given the recent surge of interest in data-driven control, this paper proposes a two-step method to study robust data-driven control for a parameter-unknown linear time-invariant (LTI) system that is affected by energy-bounded noises.
For a parameter-unknown linear descriptor system, this paper proposes data-driven methods to testify the system's type and controllability and then to stabilize it.
Conclusion: Our study provides a novel DL-based biomarker on primary tumor CNB slides to predict the metastatic status of ALN preoperatively for patients with EBC.
Experiments on MuJoCo and Hand Manipulation Suite tasks show that the agents deployed with our method achieve similar performance as it has in the source domain, while those deployed with previous methods designed for same-modal domain adaptation suffer a larger performance gap.
We also evaluate FTROJAN against state-of-the-art defenses as well as several adaptive defenses that are designed on the frequency domain.
In this demo, we present VirtualConductor, a system that can generate conducting video from any given music and a single user's image.
For global translation estimation, we propose a supporting-foot-based method and an RNN-based method to robustly solve for the global translations with a confidence-based fusion technique.
This risk model based on the m6 A-based lncRNAs may be promising for the clinical prediction of prognoses and immunotherapeutic responses in LUAD patients.
In this work, we present Emotional Video Portraits (EVP), a system for synthesizing high-quality video portraits with vivid emotional dynamics driven by audios.
To develop such approach, a higher-order tensor is constructed, whose factor matrices contain the sources azimuth and elevation information.
no code implementations • 9 Mar 2021 • Xian Sun, Peijin Wang, Zhiyuan Yan, Feng Xu, Ruiping Wang, Wenhui Diao, Jin Chen, Jihao Li, Yingchao Feng, Tao Xu, Martin Weinmann, Stefan Hinz, Cheng Wang, Kun fu
In this paper, we propose a novel benchmark dataset with more than 1 million instances and more than 15, 000 images for Fine-grAined object recognItion in high-Resolution remote sensing imagery which is named as FAIR1M.
A computationally efficient tensor decomposition method is proposed to decompose the Vandermonde factor matrices.
Information Theory Signal Processing Information Theory
Domain adaptation is a promising direction for deploying RL agents in real-world applications, where vision-based robotics tasks constitute an important part.
We present the first method for real-time full body capture that estimates shape and motion of body and hands together with a dynamic 3D face model from a single color image.
Ranked #9 on 3D Hand Pose Estimation on FreiHAND
We develop a new tensor model for slow-time multiple-input multiple output (MIMO) radar and apply it for joint direction-of-departure (DOD) and direction-of-arrival (DOA) estimation.
Deep learning in remote sensing has become an international hype, but it is mostly limited to the evaluation of optical data.
We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps and at state-of-the-art accuracy.
This paper aims to address this scalability challenge with a robust, sample-efficient, and general meta-IRL algorithm, SQUIRL, that performs a new but related long-horizon task robustly given only a single video demonstration.
In this paper, we investigate a novel problem of telling the difference between image pairs in natural language.
We propose a real-time DNN-based technique to segment hand and object of interacting motions from depth inputs.
This paper attempts to develop machine intelligence that are trainable with large-volume co-registered SAR and optical images to translate SAR image to optical version for assisted SAR image interpretation.
This paper attempts to develop machine intelligence that are trainable with large-volume co-registered SAR and optical images to translate SAR image to optical version for assisted SAR interpretation.
Experiments show that our patch deformation method improves the accuracy of feature tracking, and our 3D reconstruction outperforms the state-of-the-art solutions under fast camera motions.
Consumer depth sensors are more and more popular and come to our daily lives marked by its recent integration in the latest Iphone X.
Even worse, the noise signals also existed in the video frames, since the background of the video frame has the subpixel-level and uneven moving thanks to the motion of satellites.
In this paper, we focus on a more challenging and ill-posed problem that is to synthesize novel viewpoints from one single input image.
In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with.
Our method decomposes the semantic style transfer problem into feature reconstruction part and feature decoder part.
To reduce the ambiguities of the non-rigid deformation parameterization on the surface graph nodes, we take advantage of the internal articulated motion prior for human performance and contribute a skeleton-embedded surface fusion (SSF) method.
Both qualitative and quantitative experiments with real fully polarimetric data are conducted to show the efficacy of the proposed method.
We present a new motion tracking method to robustly reconstruct non-rigid geometries and motions from single view depth inputs captured by a consumer depth sensor.
Community Question Answering (CQA) websites have become valuable repositories which host a massive volume of human knowledge.