Search Results for author: Mengxiao Bi

Found 8 papers, 0 papers with code

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

no code implementations2 Apr 2024 Shuai Tan, Bin Ji, Mengxiao Bi, Ye Pan

Achieving disentangled control over multiple facial motions and accommodating diverse input modalities greatly enhances the application and entertainment of the talking head generation.

Disentanglement Talking Head Generation

Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models

no code implementations21 Aug 2023 Heyang Xue, Shuai Guo, Pengcheng Zhu, Mengxiao Bi

Despite imperfect score-matching causing drift in training and sampling distributions of diffusion models, recent advances in diffusion-based acoustic models have revolutionized data-sufficient single-speaker Text-to-Speech (TTS) approaches, with Grad-TTS being a prime example.

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

no code implementations21 May 2023 Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi

Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities.

Data Augmentation Knowledge Distillation +1

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features

no code implementations9 Nov 2022 Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi

We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input.

Voice Conversion

One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation

no code implementations24 Nov 2021 Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness.

Style Transfer Voice Conversion

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

no code implementations17 Oct 2021 Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi

In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates audio waveform from lyrics and musical score.

Singing Voice Synthesis Variational Inference

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

no code implementations26 Feb 2018 Mengxiao Bi, Heng Lu, Shiliang Zhang, Ming Lei, Zhijie Yan

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody.

speech-recognition Speech Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.