no code implementations • 8 Mar 2024 • Jianzong Wang, Pengcheng Li, xulong Zhang, Ning Cheng, Jing Xiao
After combining the intent from two domains into a joint representation, the integrated intent representation is fed into a decision layer for classification.
no code implementations • 16 Jan 2024 • Haobin Tang, xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang
We introduce ED-TTS, a multi-scale emotional speech synthesis model that leverages Speech Emotion Diarization (SED) and Speech Emotion Recognition (SER) to model emotions at different levels.
no code implementations • 16 Jan 2024 • Bingyuan Zhang, xulong Zhang, Ning Cheng, Jun Yu, Jing Xiao, Jianzong Wang
In recent years, the field of talking faces generation has attracted considerable attention, with certain methods adept at generating virtual faces that convincingly imitate human expressions.
no code implementations • 15 Nov 2023 • Jianzong Wang, Yimin Deng, ZiQi Liang, xulong Zhang, Ning Cheng, Jing Xiao
This paper proposes a talking face generation method named "CP-EB" that takes an audio signal as input and a person image as reference, to synthesize a photo-realistic people talking video with head poses controlled by a short video clip and proper eye blinking embedding.
no code implementations • 28 Sep 2023 • Wenting Liu, Zhaozhong Gui, Guilin Jiang, Lihua Tang, Lichun Zhou, Wan Leng, xulong Zhang, Yujiang Liu
With the increasing volume of high-frequency data in the information age, both challenges and opportunities arise in the prediction of stock volatility.
no code implementations • 19 Sep 2023 • Hao Guo, Hongbiao Si, Guilin Jiang, Wei zhang, Zhiyan Liu, Xuanyi Zhu, xulong Zhang, Yang Liu
What's more, various methods utilize attention in semantic segmentation, but the conclusion of these methods is lacking.
no code implementations • 16 Sep 2023 • Yazhong Si, xulong Zhang, Fan Yang, Jianzong Wang, Ning Cheng, Jing Xiao
Most existing sandstorm image enhancement methods are based on traditional theory and prior knowledge, which often restrict their applicability in real-world scenarios.
no code implementations • 14 Sep 2023 • Zipeng Qi, xulong Zhang, Ning Cheng, Jing Xiao, Jianzong Wang
Generating realistic talking faces is a complex and widely discussed task with numerous applications.
no code implementations • 28 Aug 2023 • xulong Zhang, Jianzong Wang, Ning Cheng, Yifu Sun, Chuanyao Zhang, Jing Xiao
The rise of the phenomenon of the "right to be forgotten" has prompted research on machine unlearning, which grants data owners the right to actively withdraw data that has been used for model training, and requires the elimination of the contribution of that data to the model.
no code implementations • 14 Mar 2023 • Kexin Zhu, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Using deep learning methods to classify EEG signals can accurately identify people's emotions.
no code implementations • 14 Mar 2023 • xulong Zhang, Haobin Tang, Jianzong Wang, Ning Cheng, Jian Luo, Jing Xiao
Because of predicting all the target tokens in parallel, the non-autoregressive models greatly improve the decoding efficiency of speech recognition compared with traditional autoregressive models.
no code implementations • 14 Mar 2023 • Haobin Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Recent expressive text to speech (TTS) models focus on synthesizing emotional speech, but some fine-grained styles such as intonation are neglected.
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Mengyuan Zhao, Zhiyong Zhang, Jing Xiao
We also find that in joint CTC-Attention ASR model, decoder is more sensitive to linguistic information than acoustic information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Most previous neural text-to-speech (TTS) methods are mainly based on supervised learning methods, which means they depend on a large training dataset and hard to achieve comparable performance under low-resource conditions.
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Recent advances in pre-trained language models have improved the performance for text classification tasks.
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
In this paper, we proposed Adapitch, a multi-speaker TTS method that makes adaptation of the supervised module with untranscribed data.
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Metaverse expands the physical world to a new dimension, and the physical environment and Metaverse environment can be directly connected and entered.
no code implementations • 25 Oct 2022 • xulong Zhang, Jianzong Wang, Ning Cheng, Kexin Zhu, Jing Xiao
In this work, we proposed two kinds of masking approaches: (1) speech-level masking, making the model to mask more speech segments than silence segments, (2) phoneme-level masking, forcing the model to mask the whole frames of the phoneme, instead of phoneme pieces.
no code implementations • 13 Oct 2022 • Aolan Sun, xulong Zhang, Tiandong Ling, Jianzong Wang, Ning Cheng, Jing Xiao
Since the beginning of the COVID-19 pandemic, remote conferencing and school-teaching have become important tools.
no code implementations • 21 Sep 2022 • Shijing Si, Jianzong Wang, xulong Zhang, Xiaoyang Qu, Ning Cheng, Jing Xiao
Nonparallel multi-domain voice conversion methods such as the StarGAN-VCs have been widely applied in many scenarios.
no code implementations • 8 Aug 2022 • Huaizhen Tang, xulong Zhang, Jianzong Wang, Ning Cheng, Zhen Zeng, Edward Xiao, Jing Xiao
In this paper, a novel voice conversion framework, named $\boldsymbol T$ext $\boldsymbol G$uided $\boldsymbol A$utoVC(TGAVC), is proposed to more effectively separate content and timbre from speech, where an expected content embedding produced based on the text transcriptions is designed to guide the extraction of voice content.
no code implementations • 29 Sep 2021 • Tang huaizhen, xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Voice conversion(VC) aims to convert one speaker's voice to generate a new speech as it is said by another speaker.
no code implementations • 9 Apr 2020 • xulong Zhang, Jianzong Wang, Ning Cheng, Jing Xiao
Most singer identification methods are processed in the frequency domain, which potentially leads to information loss during the spectral transformation.
no code implementations • 8 Apr 2020 • Yifu Sun, xulong Zhang, Yi Yu, Xi Chen, Wei Li
Singing voice detection (SVD), to recognize vocal parts in the song, is an essential task in music information retrieval (MIR).