Search Results for author: Liumeng Xue

Found 9 papers, 2 papers with code

Single-Codec: Single-Codebook Speech Codec towards High-Performance Speech Generation

no code implementations11 Jun 2024 Hanzhao Li, Liumeng Xue, Haohan Guo, Xinfa Zhu, YuanJun Lv, Lei Xie, Yunlin Chen, Hao Yin, Zhifei Li

The multi-codebook speech codec enables the application of large language models (LLM) in TTS but bottlenecks efficiency and robustness due to multi-sequence prediction.

WenetSpeech4TTS: A 12,800-hour Mandarin TTS Corpus for Large Speech Generation Model Benchmark

1 code implementation9 Jun 2024 Linhan Ma, Dake Guo, Kun Song, Yuepeng Jiang, Shuai Wang, Liumeng Xue, Weiming Xu, Huan Zhao, BinBin Zhang, Lei Xie

Furthermore, we have created subsets of varying sizes, categorized by segment quality scores to allow for TTS model training and fine-tuning.

Transfer the linguistic representations from TTS to accent conversion with non-parallel data

no code implementations7 Jan 2024 Xi Chen, Jiakun Pei, Liumeng Xue, Mingyang Zhang

This paper introduces a novel non-autoregressive framework for accent conversion that learns accent-agnostic linguistic representations and employs them to convert the accent in the source speech.

Voice Conversion

Multi-level Temporal-channel Speaker Retrieval for Zero-shot Voice Conversion

no code implementations12 May 2023 Zhichao Wang, Liumeng Xue, Qiuqiang Kong, Lei Xie, Yuanzhe Chen, Qiao Tian, Yuping Wang

Specifically, to flexibly adapt to the dynamic-variant speaker characteristic in the temporal and channel axis of the speech, we propose a novel fine-grained speaker modeling method, called temporal-channel retrieval (TCR), to find out when and where speaker information appears in speech.

Disentanglement Retrieval +2

Expressive-VC: Highly Expressive Voice Conversion with Attention Fusion of Bottleneck and Perturbation Features

no code implementations9 Nov 2022 Ziqian Ning, Qicong Xie, Pengcheng Zhu, Zhichao Wang, Liumeng Xue, Jixun Yao, Lei Xie, Mengxiao Bi

We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input.

Decoder Voice Conversion

ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS

no code implementations14 Sep 2022 Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie

To alleviate the difficulty in training, we propose to model linguistic and prosodic information by considering cross-sentence, embedded structure in training.

Position Sentence

Building a mixed-lingual neural TTS system with only monolingual data

no code implementations12 Apr 2019 Liumeng Xue, Wei Song, Guanghui Xu, Lei Xie, Zhizheng Wu

When deploying a Chinese neural text-to-speech (TTS) synthesis system, one of the challenges is to synthesize Chinese utterances with English phrases or words embedded.

Decoder

Cannot find the paper you are looking for? You can Submit a new open access paper.