Search Results for author: Mengxiao Bi

Found 12 papers, 0 papers with code

Revealing Directions for Text-guided 3D Face Editing

no code implementations7 Oct 2024 Zhuo Chen, Yichao Yan, Sehngqi Liu, Yuhao Cheng, Weiming Zhao, Lincheng Li, Mengxiao Bi, Xiaokang Yang

Experiments demonstrate the effectiveness and generalization of our Face Clan for various pre-trained GANs.

Attribute Denoising

E1 TTS: Simple and Fast Non-Autoregressive TTS

no code implementations14 Sep 2024 Zhijun Liu, Shuai Wang, Pengcheng Zhu, Mengxiao Bi, Haizhou Li

This paper introduces Easy One-Step Text-to-Speech (E1 TTS), an efficient non-autoregressive zero-shot text-to-speech system based on denoising diffusion pretraining and distribution matching distillation.

Denoising Text to Speech

HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects

no code implementations17 Jul 2024 Xintao Lv, Liang Xu, Yichao Yan, Xin Jin, Congsheng Xu, Shuwen Wu, Yifan Liu, Lincheng Li, Mengxiao Bi, Wenjun Zeng, Xiaokang Yang

Thus, we propose HIMO, a large-scale MoCap dataset of full-body human interacting with multiple objects, containing 3. 3K 4D HOI sequences and 4. 08M 3D HOI frames.

Benchmarking Human-Object Interaction Detection +1

DualVC 3: Leveraging Language Model Generated Pseudo Context for End-to-end Low Latency Streaming Voice Conversion

no code implementations12 Jun 2024 Ziqian Ning, Shuai Wang, Pengcheng Zhu, Zhichao Wang, Jixun Yao, Lei Xie, Mengxiao Bi

With speaker-independent semantic tokens to guide the training of the content encoder, the dependency on ASR is removed and the model can operate under extremely small chunks, with cascading errors eliminated.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis

no code implementations2 Apr 2024 Shuai Tan, Bin Ji, Mengxiao Bi, Ye Pan

Achieving disentangled control over multiple facial motions and accommodating diverse input modalities greatly enhances the application and entertainment of the talking head generation.

Disentanglement Talking Head Generation

Multi-GradSpeech: Towards Diffusion-based Multi-Speaker Text-to-speech Using Consistent Diffusion Models

no code implementations21 Aug 2023 Heyang Xue, Shuai Guo, Pengcheng Zhu, Mengxiao Bi

Despite imperfect score-matching causing drift in training and sampling distributions of diffusion models, recent advances in diffusion-based acoustic models have revolutionized data-sufficient single-speaker Text-to-Speech (TTS) approaches, with Grad-TTS being a prime example.

Text to Speech

DualVC: Dual-mode Voice Conversion using Intra-model Knowledge Distillation and Hybrid Predictive Coding

no code implementations21 May 2023 Ziqian Ning, Yuepeng Jiang, Pengcheng Zhu, Jixun Yao, Shuai Wang, Lei Xie, Mengxiao Bi

Voice conversion is an increasingly popular technology, and the growing number of real-time applications requires models with streaming conversion capabilities.

Data Augmentation Decoder +2

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features

no code implementations9 Nov 2022 Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi

We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input.

Decoder Voice Conversion

One-shot Voice Conversion For Style Transfer Based On Speaker Adaptation

no code implementations24 Nov 2021 Zhichao Wang, Qicong Xie, Tao Li, Hongqiang Du, Lei Xie, Pengcheng Zhu, Mengxiao Bi

One-shot style transfer is a challenging task, since training on one utterance makes model extremely easy to over-fit to training data and causes low speaker similarity and lack of expressiveness.

Style Transfer Voice Conversion

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

no code implementations17 Oct 2021 Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi

In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates audio waveform from lyrics and musical score.

Decoder Singing Voice Synthesis +1

Deep Feed-forward Sequential Memory Networks for Speech Synthesis

no code implementations26 Feb 2018 Mengxiao Bi, Heng Lu, Shiliang Zhang, Ming Lei, Zhijie Yan

The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody.

speech-recognition Speech Recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.