Search Results for author: xulong Zhang

Found 24 papers, 0 papers with code

Medical Speech Symptoms Classification via Disentangled Representation

no code implementations8 Mar 2024 Jianzong Wang, Pengcheng Li, xulong Zhang, Ning Cheng, Jing Xiao

After combining the intent from two domains into a joint representation, the integrated intent representation is fed into a decision layer for classification.

Classification

ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis

no code implementations16 Jan 2024 Haobin Tang, xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

We introduce ED-TTS, a multi-scale emotional speech synthesis model that leverages Speech Emotion Diarization (SED) and Speech Emotion Recognition (SER) to model emotions at different levels.

Denoising Emotional Speech Synthesis +1

EmoTalker: Emotionally Editable Talking Face Generation via Diffusion Model

no code implementations16 Jan 2024 Bingyuan Zhang, xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao, Jianzong Wang

In recent years, the field of talking faces generation has attracted considerable attention, with certain methods adept at generating virtual faces that convincingly imitate human expressions.

Denoising Talking Face Generation

CP-EB: Talking Face Generation with Controllable Pose and Eye Blinking Embedding

no code implementations15 Nov 2023 Jianzong Wang, Yimin Deng, ZiQi Liang, xulong Zhang, Ning Cheng, Jing Xiao

This paper proposes a talking face generation method named "CP-EB" that takes an audio signal as input and a person image as reference, to synthesize a photo-realistic people talking video with head poses controlled by a short video clip and proper eye blinking embedding.

Talking Face Generation

Stock Volatility Prediction Based on Transformer Model Using Mixed-Frequency Data

no code implementations28 Sep 2023 Wenting Liu, Zhaozhong Gui, Guilin Jiang, Lihua Tang, Lichun Zhou, Wan Leng, xulong Zhang, Yujiang Liu

With the increasing volume of high-frequency data in the information age, both challenges and opportunities arise in the prediction of stock volatility.

An Empirical Study of Attention Networks for Semantic Segmentation

no code implementations19 Sep 2023 Hao Guo, Hongbiao Si, Guilin Jiang, Wei zhang, Zhiyan Liu, Xuanyi Zhu, xulong Zhang, Yang Liu

What's more, various methods utilize attention in semantic segmentation, but the conclusion of these methods is lacking.

Segmentation Semantic Segmentation

AOSR-Net: All-in-One Sandstorm Removal Network

no code implementations16 Sep 2023 Yazhong Si, xulong Zhang, Fan Yang, Jianzong Wang, Ning Cheng, Jing Xiao

Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios.

Image Enhancement Image Restoration

DiffTalker: Co-driven audio-image diffusion for talking faces via intermediate landmarks

no code implementations14 Sep 2023 Zipeng Qi, xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang

Generating realistic talking faces is a complex and widely discussed task with numerous applications.

Face Generation

Machine Unlearning Methodology base on Stochastic Teacher Network

no code implementations28 Aug 2023 xulong Zhang, Jianzong Wang, Ning Cheng, Yifu Sun, Chuanyao Zhang, Jing Xiao

The rise of the phenomenon of the "right to be forgotten" has prompted research on machine unlearning, which grants data owners the right to actively withdraw data that has been used for model training, and requires the elimination of the contribution of that data to the model.

Machine Unlearning

Dynamic Alignment Mask CTC: Improved Mask-CTC with Aligned Cross Entropy

no code implementations14 Mar 2023 xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao

Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models.

Position Sentence +2

QI-TTS: Questioning Intonation Control for Emotional Speech Synthesis

no code implementations14 Mar 2023 Haobin Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected.

Emotional Speech Synthesis Sentence

Semi-Supervised Learning Based on Reference Model for Low-resource TTS

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Most previous neural text-to-speech (TTS) methods are mainly based on supervised learning methods, which means they depend on a large training dataset and hard to achieve comparable performance under low-resource conditions.

Speech Synthesis

Improving Imbalanced Text Classification with Dynamic Curriculum Learning

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Recent advances in pre-trained language models have improved the performance for text classification tasks.

Scheduling text-classification +1

Adapitch: Adaption Multi-Speaker Text-to-Speech Conditioned on Pitch Disentangling with Untranscribed Data

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

In this paper, we proposed Adapitch, a multi-speaker TTS method that makes adaptation of the supervised module with untranscribed data.

Disentanglement

MetaSpeech: Speech Effects Switch Along with Environment for Metaverse

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Metaverse expands the physical world to a new dimension, and the physical environment and Metaverse environment can be directly connected and entered.

Voice Conversion

Improving Speech Representation Learning via Speech-level and Phoneme-level Masking Approach

no code implementations25 Oct 2022 xulong Zhang, Jianzong Wang, Ning Cheng, Kexin Zhu, Jing Xiao

In this work, we proposed two kinds of masking approaches: (1) speech-level masking, making the model to mask more speech segments than silence segments, (2) phoneme-level masking, forcing the model to mask the whole frames of the phoneme, instead of phoneme pieces.

Representation Learning Speaker Recognition

Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar

no code implementations13 Oct 2022 Aolan Sun, xulong Zhang, Tiandong Ling, Jianzong Wang, Ning Cheng, Jing Xiao

Since the beginning of the COVID-19 pandemic, remote conferencing and school-teaching have become important tools.

Boosting Star-GANs for Voice Conversion with Contrastive Discriminator

no code implementations21 Sep 2022 Shijing Si, Jianzong Wang, xulong Zhang, Xiaoyang Qu, Ning Cheng, Jing Xiao

Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios.

Contrastive Learning Voice Conversion

TGAVC: Improving Autoencoder Voice Conversion with Text-Guided and Adversarial Training

no code implementations8 Aug 2022 Huaizhen Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Zhen Zeng, Edward Xiao, Jing Xiao

In this paper, a novel voice conversion framework, named $\boldsymbol T$ext $\boldsymbol G$uided $\boldsymbol A$utoVC(TGAVC), is proposed to more effectively separate content and timbre from speech, where an expected content embedding produced based on the text transcriptions is designed to guide the extraction of voice content.

Voice Conversion

MDCNN-SID: Multi-scale Dilated Convolution Network for Singer Identification

no code implementations9 Apr 2020 xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao

Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the spectral transformation.

Artist classification Music Generation +1

Investigation of Singing Voice Separation for Singing Voice Detection in Polyphonic Music

no code implementations8 Apr 2020 Yifu Sun, xulong Zhang, Yi Yu, Xi Chen, Wei Li

Singing voice detection (SVD), to recognize vocal parts in the song, is an essential task in music information retrieval (MIR).

Information Retrieval Melody Extraction +2

Cannot find the paper you are looking for? You can Submit a new open access paper.