We demonstrate ViDA-MAN, a digital-human agent for multi-modal interaction, which offers realtime audio-visual responses to instant speech inquiries.
Deep person generation has attracted extensive research attention due to its wide applications in virtual agents, video conferencing, online shopping and art/movie production.
Simulation results show that the proposed DMM performs better than the existing distributed Gauss-Newton method (DGN) in terms of root of mean square error (RMSE) under a limited low communication overhead constraint.
This paper develops an unmanned aerial vehicle (UAV) deployment scheme in the context of the directional modulation-based secure precise wireless transmissions (SPWTs), where the optimal UAV position for the SPWT is derived to maximum the secrecy rate (SR) without injecting any artificial noise (AN) signaling.
With the increasing popularity of calcium imaging data in neuroscience research, methods for analyzing calcium trace data are critical to address various questions.
Then, multiple RISs are utilized to achieve SPWT through the reflection path among transmitter, RISs and receivers in order to enhance the communication performance and energy efficiency simultaneously.
To fully exploit the supervision in the source domain, we propose a fine-grained adversarial learning strategy for class-level feature alignment while preserving the internal structure of semantics across domains.
Ranked #15 on Image-to-Image Translation on SYNTHIA-to-Cityscapes
The stunning progress in face manipulation methods has made it possible to synthesize realistic fake face images, which poses potential threats to our society.
In this work, we show that such adversarial-based methods can only reduce the domain style gap, but cannot address the domain content distribution gap that is shown to be important for object detectors.
To tackle the unsupervised domain adaptation problem, we explore the possibilities to generate high-quality labels as proxy labels to supervise the training on target data.
Proposed regional robust schemes are designed for optimizing the secrecy performance in the whole error region around the estimated location.
In this work, we focus on weak supervision, developing a method for training a high-quality pixel-level classifier for semantic segmentation, using only image-level class labels as the provided ground-truth.
In our work, we focus on the weakly supervised semantic segmentation with image label annotations.
Training a Fully Convolutional Network (FCN) for semantic segmentation requires a large number of masks with pixel level labelling, which involves a large amount of human labour and time for annotation.
Semantic image segmentation is a fundamental task in image understanding.