Search Results for author: Daxin Tan

Found 9 papers, 1 papers with code

CUHK-EE Voice Cloning System for ICASSP 2021 M2VoC Challenge

no code implementations8 Mar 2021 Daxin Tan, Hingpang Huang, Guangyan Zhang, Tan Lee

100 and 5 utterances of 3 target speakers in different voice and style are provided in track 1 and 2 respectively, and the participants are required to synthesize speech in target speaker's voice and style.

Voice Cloning

EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion

1 code implementation4 Jul 2021 Daxin Tan, Liqun Deng, Yu Ting Yeung, Xin Jiang, Xiao Chen, Tan Lee

This paper presents the design, implementation and evaluation of a speech editing system, named EditSpeech, which allows a user to perform deletion, insertion and replacement of words in a given speech utterance, without causing audible degradation in speech quality and naturalness.

Applying the Information Bottleneck Principle to Prosodic Representation Learning

no code implementations5 Aug 2021 Guangyan Zhang, Ying Qin, Daxin Tan, Tan Lee

This paper describes a novel design of a neural network-based speech generation model for learning prosodic representation. The problem of representation learning is formulated according to the information bottleneck (IB) principle.

Representation Learning

A study on the efficacy of model pre-training in developing neural text-to-speech system

no code implementations8 Oct 2021 Guangyan Zhang, Yichong Leng, Daxin Tan, Ying Qin, Kaitao Song, Xu Tan, Sheng Zhao, Tan Lee

However, in terms of ultimately achieved system performance for target speaker(s), the actual benefits of model pre-training are uncertain and unstable, depending very much on the quantity and text content of training data.

Computational Efficiency

Environment Aware Text-to-Speech Synthesis

no code implementations8 Oct 2021 Daxin Tan, Guangyan Zhang, Tan Lee

The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condition in the process of neural network based speech synthesis.

Attribute Disentanglement +2

Mixed-Phoneme BERT: Improving BERT with Mixed Phoneme and Sup-Phoneme Representations for Text to Speech

no code implementations31 Mar 2022 Guangyan Zhang, Kaitao Song, Xu Tan, Daxin Tan, Yuzi Yan, Yanqing Liu, Gang Wang, Wei Zhou, Tao Qin, Tan Lee, Sheng Zhao

However, the works apply pre-training with character-based units to enhance the TTS phoneme encoder, which is inconsistent with the TTS fine-tuning that takes phonemes as input.

Cannot find the paper you are looking for? You can Submit a new open access paper.