no code implementations • 14 Apr 2024 • Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, Jinzuomu Zhong, Benlai Tang
Furthermore, a parallelized TTS frontend model is delicately devised to execute TN, PD, and PBP prediction tasks, respectively in the second stage.
1 code implementation • 11 Sep 2023 • Jinzuomu Zhong, Yang Li, Hui Huang, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu
While human prosody annotation contributes a lot to the performance, it is a labor-intensive and time-consuming process, often resulting in inconsistent outcomes.
no code implementations • 27 Jun 2023 • Jie Liu, Zhiba Su, Hui Huang, Caiyan Wan, Quanxiu Wang, Jiangli Hong, Benlai Tang, Fengjie Zhu
We propose our novel TranssionADD system as a solution to the challenging problem of model robustness and audio segment outliers in the trace competition.
no code implementations • 23 May 2023 • Jingning Xu, Benlai Tang, Mingjie Wang, Minghao Li, Meirong Ma
Recently, talking face generation has drawn ever-increasing attention from the research community in computer vision due to its arduous challenges and widespread application scenarios, e. g. movie animation and virtual anchor.
no code implementations • 17 Jan 2022 • Tianyi Xie, Liucheng Liao, Cheng Bi, Benlai Tang, Xiang Yin, Jianfei Yang, Mingjie Wang, Jiali Yao, Yang Zhang, Zejun Ma
The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video.
1 code implementation • 14 Oct 2021 • Jingning Xu, Benlai Tang, Mingjie Wang, Siyuan Bian, Wenyi Guo, Xiang Yin, Zejun Ma
To tackle this problem, most recent AdaIN-based architectures are proposed to extract clothes and scenario features for generation.
no code implementations • 10 Oct 2021 • Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Yibiao Yu, Zejun Ma
Experiments show that, compared with the baseline models, our proposed model can significantly improve the naturalness of converted singing voices and the similarity with the target singer.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 28 Oct 2020 • Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Ling Xu, Chen Shen, Zejun Ma
Singing voice conversion (SVC) aims to convert the voice of one singer to that of other singers while keeping the singing content and melody.
no code implementations • 19 May 2020 • Wenjie Li, Benlai Tang, Xiang Yin, Yushi Zhao, Wei Li, Kang Wang, Hao Huang, Yuxuan Wang, Zejun Ma
Accent conversion (AC) transforms a non-native speaker's accent into a native accent while maintaining the speaker's voice timbre.
no code implementations • 23 Apr 2020 • Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma
This paper presents ByteSing, a Chinese singing voice synthesis (SVS) system based on duration allocated Tacotron-like acoustic models and WaveRNN neural vocoders.
no code implementations • 6 Dec 2018 • Qiao Tian, Bing Yang, Jing Chen, Benlai Tang, Shan Liu
Firstly, due to the noisy input signal of the model, there is still a gap between the quality of generated and natural waveforms.
Generative Adversarial Network Vocal Bursts Intensity Prediction