Search Results for author: Mengxiao Bi

Found 8 papers, 0 papers with code

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

no code implementations • 2 Apr 2024 • Shuai Tan, Bin Ji, Mengxiao Bi, Ye Pan

Achieving disentangled control over multiple facial motions and accommodating diverse input modalities greatly enhances the application and entertainment of the talking head generation.

Disentanglement Talking Head Generation

Paper
Add Code

DualVC 2: Dynamic Masked Convolution for Unified Streaming and Non-Streaming Voice Conversion

no code implementations • 27 Sep 2023 • Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Shuai Wang, Jixun Yao, Lei Xie, Mengxiao Bi

Third, the model is unable to effectively address the noise in the unvoiced segments, lowering the sound quality.

Knowledge Distillation Voice Conversion

Paper
Add Code

Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models

no code implementations • 21 Aug 2023 • Heyang Xue, Shuai Guo, Pengcheng Zhu, Mengxiao Bi

Despite imperfect score-matching causing drift in training and sampling distributions of diffusion models, recent advances in diffusion-based acoustic models have revolutionized data-sufficient single-speaker Text-to-Speech (TTS) approaches, with Grad-TTS being a prime example.

Paper
Add Code

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

no code implementations • 21 May 2023 • Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi

Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities.

Data Augmentation Knowledge Distillation +1

Paper
Add Code

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features

no code implementations • 9 Nov 2022 • Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi

We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input.

Voice Conversion

Paper
Add Code

One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation

no code implementations • 24 Nov 2021 • Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness.

Style Transfer Voice Conversion

Paper
Add Code

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

no code implementations • 17 Oct 2021 • Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi

In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates audio waveform from lyrics and musical score.

Singing Voice Synthesis Variational Inference

Paper
Add Code

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

no code implementations • 26 Feb 2018 • Mengxiao Bi, Heng Lu, Shiliang Zhang, Ming Lei, Zhijie Yan

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody.

speech-recognition Speech Recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.