Search Results for author: Benlai Tang

Found 11 papers, 2 papers with code

Prior-agnostic Multi-scale Contrastive Text-Audio Pre-training for Parallelized TTS Frontend Modeling

no code implementations • 14 Apr 2024 • Quanxiu Wang, Hui Huang, Mingjie Wang, Yong Dai, Jinzuomu Zhong, Benlai Tang

Furthermore, a parallelized TTS frontend model is delicately devised to execute TN, PD, and PBP prediction tasks, respectively in the second stage.

Polyphone disambiguation

Paper
Add Code

Multi-Modal Automatic Prosody Annotation with Contrastive Pretraining of SSWP

1 code implementation • 11 Sep 2023 • Jinzuomu Zhong, Yang Li, Hui Huang, Jie Liu, Zhiba Su, Jing Guo, Benlai Tang, Fengjie Zhu

While human prosody annotation contributes a lot to the performance, it is a labor-intensive and time-consuming process, often resulting in inconsistent outcomes.

Paper
Code

TranssionADD: A multi-frame reinforcement based sequence tagging model for audio deepfake detection

no code implementations • 27 Jun 2023 • Jie Liu, Zhiba Su, Hui Huang, Caiyan Wan, Quanxiu Wang, Jiangli Hong, Benlai Tang, Fengjie Zhu

We propose our novel TranssionADD system as a solution to the challenging problem of model robustness and audio segment outliers in the trace competition.

Data Augmentation DeepFake Detection +1

Paper
Add Code

CPNet: Exploiting CLIP-based Attention Condenser and Probability Map Guidance for High-fidelity Talking Face Generation

no code implementations • 23 May 2023 • Jingning Xu, Benlai Tang, Mingjie Wang, Minghao Li, Meirong Ma

Recently, talking face generation has drawn ever-increasing attention from the research community in computer vision due to its arduous challenges and widespread application scenarios, e. g. movie animation and virtual anchor.

Talking Face Generation

Paper
Add Code

Towards Realistic Visual Dubbing with Heterogeneous Sources

no code implementations • 17 Jan 2022 • Tianyi Xie, Liucheng Liao, Cheng Bi, Benlai Tang, Xiang Yin, Jianfei Yang, Mingjie Wang, Jiali Yao, Yang Zhang, Zejun Ma

The task of few-shot visual dubbing focuses on synchronizing the lip movements with arbitrary speech input for any talking head video.

Disentanglement Talking Head Generation

Paper
Add Code

Towards Using Clothes Style Transfer for Scenario-aware Person Video Generation

1 code implementation • 14 Oct 2021 • Jingning Xu, Benlai Tang, Mingjie Wang, Siyuan Bian, Wenyi Guo, Xiang Yin, Zejun Ma

To tackle this problem, most recent AdaIN-based architectures are proposed to extract clothes and scenario features for generation.

Style Transfer Video Generation

Paper
Code

Towards High-fidelity Singing Voice Conversion with Acoustic Reference and Contrastive Predictive Coding

no code implementations • 10 Oct 2021 • Chao Wang, Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Yibiao Yu, Zejun Ma

Experiments show that, compared with the baseline models, our proposed model can significantly improve the naturalness of converted singing voices and the similarity with the target singer.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

PPG-based singing voice conversion with adversarial representation learning

no code implementations • 28 Oct 2020 • Zhonghao Li, Benlai Tang, Xiang Yin, Yuan Wan, Ling Xu, Chen Shen, Zejun Ma

Singing voice conversion (SVC) aims to convert the voice of one singer to that of other singers while keeping the singing content and melody.

Representation Learning Voice Conversion +1

Paper
Add Code

Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech

no code implementations • 19 May 2020 • Wenjie Li, Benlai Tang, Xiang Yin, Yushi Zhao, Wei Li, Kang Wang, Hao Huang, Yuxuan Wang, Zejun Ma

Accent conversion (AC) transforms a non-native speaker's accent into a native accent while maintaining the speaker's voice timbre.

Paper
Add Code

ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders

no code implementations • 23 Apr 2020 • Yu Gu, Xiang Yin, Yonghui Rao, Yuan Wan, Benlai Tang, Yang Zhang, Jitong Chen, Yuxuan Wang, Zejun Ma

This paper presents ByteSing, a Chinese singing voice synthesis (SVS) system based on duration allocated Tacotron-like acoustic models and WaveRNN neural vocoders.

Singing Voice Synthesis

Paper
Add Code

Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder

no code implementations • 6 Dec 2018 • Qiao Tian, Bing Yang, Jing Chen, Benlai Tang, Shan Liu

Firstly, due to the noisy input signal of the model, there is still a gap between the quality of generated and natural waveforms.

Generative Adversarial Network Vocal Bursts Intensity Prediction

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.