A complete 3D face reconstruction requires to explicitly model the eyeglasses on the face, which is less investigated in the literature.
Using explicit "dynamic" tri-plane as an efficient container for parameterized head geometry, aligned well with factors in the underlying geometry and tri-plane, we obtain aligned canonical factors for the canonical gaussians.
We believe that the combination is complementary and able to solve the inherent difficulties of using one modality input, including occlusions, extreme lighting/texture, and out-of-view for visual mocap and global drifts for inertial mocap.
Ranked #1 on 3D Human Pose Estimation on AIST++
SAR images are highly sensitive to observation configurations, and they exhibit significant variations across different viewing angles, making it challenging to represent and learn their anisotropic features.
Radar imaging is crucial in remote sensing and has many applications in detection and autonomous driving.
The constant interplay and information exchange between cells and their micro-environment are essential to their survival and ability to execute biological functions.
In this article, a new concept of Tandem Dual-Antenna SAR Interferometry (TDA-InSAR) system for single-pass reliable 3D surface mapping using the asymptotic 3D PU is proposed.
Further, three novel granule fusion strategies are utilized to combine granules into stable cluster structures, helping to detect clusters with arbitrary shapes.
1 code implementation • 5 May 2023 • Chuang Zhu, ShengJie Liu, Zekuan Yu, Feng Xu, Arpit Aggarwal, Germán Corredor, Anant Madabhushi, Qixun Qu, Hongwei Fan, Fangda Li, Yueheng Li, Xianchao Guan, Yongbing Zhang, Vivek Kumar Singh, Farhan Akram, Md. Mostafa Kamal Sarker, Zhongyue Shi, Mulan Jin
For invasive breast cancer, immunohistochemical (IHC) techniques are often used to detect the expression level of human epidermal growth factor receptor-2 (HER2) in breast tissue to formulate a precise treatment plan.
We integrate the two techniques together in EgoLocate, a system that simultaneously performs human motion capture (mocap), localization, and mapping in real time from sparse body-mounted sensors, including 6 inertial measurement units (IMUs) and a monocular phone camera.
This paper proposes the first 3D morphable face reflectance model with spatially varying BRDF using only low-cost publicly-available data.
In this paper, we propose a novel active imitation learning framework based on a teacher-student interaction model, in which the teacher's goal is to identify the best teaching behavior and actively affect the student's learning process.
To achieve this, we propose a scene analysis method to detect and initialize key points by considering the dynamics in the scene, and a weighted key points strategy to model topologically varying dynamics by joint key points and weights optimization.
By supervising shadow rays, we successfully reconstruct a neural SDF of the scene from single-view images under multiple lighting conditions.
Deep neural networks (DNNs), are widely used in many industries such as image recognition, supply chain, medical diagnosis, and autonomous driving.
Single view-based reconstruction of hand-object interaction is challenging due to the severe observation missing caused by occlusions.
Achieving highly accurate dynamic or simulator models that are close to the real robot can facilitate model-based controls (e. g., model predictive control or linear-quadradic regulators), model-based trajectory planning (e. g., trajectory optimization), and decrease the amount of learning time necessary for reinforcement learning methods.
Existing defenses are mainly built upon the observation that the backdoor trigger is usually of small size or affects the activation of only a few neurons.
The generalization for different scenarios and dif-ferent users is an urgent problem for millimeter wave gesture recognition for indoor fiber-to-the-room (FTTR) scenario.
Previous works on morphable models mostly focus on large-scale facial geometry but ignore facial details.
Therefore, we propose a novel Music Motion Synchronized Generative Adversarial Network (M2S-GAN), which generates motions according to the automatically learned music representations.
Spiking neural networks (SNNs) are brain-inspired machine learning algorithms with merits such as biological plausibility and unsupervised learning capability.
Although significant progress has been made to audio-driven talking face generation, existing methods either neglect facial emotion or cannot be applied to arbitrary subjects.
The evaluation of human epidermal growth factor receptor 2 (HER2) expression is essential to formulate a precise treatment for breast cancer.
Ranked #1 on Image-to-Image Translation on BCI
In our technique, the motion of visible regions is first estimated and combined with temporal information to infer the motion of the occluded regions through an LSTM-involved graph neural network.
Given the recent surge of interest in data-driven control, this paper proposes a two-step method to study robust data-driven control for a parameter-unknown linear time-invariant (LTI) system that is affected by energy-bounded noises.
For a parameter-unknown linear descriptor system, this paper proposes data-driven methods to testify the system's type and controllability and then to stabilize it.
Conclusion: Our study provides a novel DL-based biomarker on primary tumor CNB slides to predict the metastatic status of ALN preoperatively for patients with EBC.
Experiments on MuJoCo and Hand Manipulation Suite tasks show that the agents deployed with our method achieve similar performance as it has in the source domain, while those deployed with previous methods designed for same-modal domain adaptation suffer a larger performance gap.
We also evaluate FTROJAN against state-of-the-art defenses as well as several adaptive defenses that are designed on the frequency domain.
In this demo, we present VirtualConductor, a system that can generate conducting video from any given music and a single user's image.
In reinforcement learning, experience replay stores past samples for further reuse.
For global translation estimation, we propose a supporting-foot-based method and an RNN-based method to robustly solve for the global translations with a confidence-based fusion technique.
This risk model based on the m6 A-based lncRNAs may be promising for the clinical prediction of prognoses and immunotherapeutic responses in LUAD patients.
In this work, we present Emotional Video Portraits (EVP), a system for synthesizing high-quality video portraits with vivid emotional dynamics driven by audios.
To develop such approach, a higher-order tensor is constructed, whose factor matrices contain the sources azimuth and elevation information.
no code implementations • 9 Mar 2021 • Xian Sun, Peijin Wang, Zhiyuan Yan, Feng Xu, Ruiping Wang, Wenhui Diao, Jin Chen, Jihao Li, Yingchao Feng, Tao Xu, Martin Weinmann, Stefan Hinz, Cheng Wang, Kun fu
In this paper, we propose a novel benchmark dataset with more than 1 million instances and more than 15, 000 images for Fine-grAined object recognItion in high-Resolution remote sensing imagery which is named as FAIR1M.
A computationally efficient tensor decomposition method is proposed to decompose the Vandermonde factor matrices.
Information Theory Signal Processing Information Theory
Domain adaptation is a promising direction for deploying RL agents in real-world applications, where vision-based robotics tasks constitute an important part.
We present the first method for real-time full body capture that estimates shape and motion of body and hands together with a dynamic 3D face model from a single color image.
Ranked #11 on 3D Hand Pose Estimation on FreiHAND
We develop a new tensor model for slow-time multiple-input multiple output (MIMO) radar and apply it for joint direction-of-departure (DOD) and direction-of-arrival (DOA) estimation.
Deep learning in remote sensing has become an international hype, but it is mostly limited to the evaluation of optical data.
Face recognition under this situation is referred to as single sample face recognition and poses significant challenges to the effective training of deep models.
We present a novel method for monocular hand shape and pose estimation at unprecedented runtime performance of 100fps and at state-of-the-art accuracy.
This paper aims to address this scalability challenge with a robust, sample-efficient, and general meta-IRL algorithm, SQUIRL, that performs a new but related long-horizon task robustly given only a single video demonstration.
In this paper, we investigate a novel problem of telling the difference between image pairs in natural language.
We propose a real-time DNN-based technique to segment hand and object of interacting motions from depth inputs.
This paper attempts to develop machine intelligence that are trainable with large-volume co-registered SAR and optical images to translate SAR image to optical version for assisted SAR image interpretation.
This paper attempts to develop machine intelligence that are trainable with large-volume co-registered SAR and optical images to translate SAR image to optical version for assisted SAR interpretation.
Experiments show that our patch deformation method improves the accuracy of feature tracking, and our 3D reconstruction outperforms the state-of-the-art solutions under fast camera motions.
Consumer depth sensors are more and more popular and come to our daily lives marked by its recent integration in the latest Iphone X.
Even worse, the noise signals also existed in the video frames, since the background of the video frame has the subpixel-level and uneven moving thanks to the motion of satellites.
In this paper, we focus on a more challenging and ill-posed problem that is to synthesize novel viewpoints from one single input image.
In this article, we analyze the challenges of using deep learning for remote sensing data analysis, review the recent advances, and provide resources to make deep learning in remote sensing ridiculously simple to start with.
Our method decomposes the semantic style transfer problem into feature reconstruction part and feature decoder part.
To reduce the ambiguities of the non-rigid deformation parameterization on the surface graph nodes, we take advantage of the internal articulated motion prior for human performance and contribute a skeleton-embedded surface fusion (SSF) method.
Both qualitative and quantitative experiments with real fully polarimetric data are conducted to show the efficacy of the proposed method.
We present a new motion tracking method to robustly reconstruct non-rigid geometries and motions from single view depth inputs captured by a consumer depth sensor.
Community Question Answering (CQA) websites have become valuable repositories which host a massive volume of human knowledge.