Search Results for author: Frank K. Soong

Found 13 papers, 4 papers with code

ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading

no code implementations3 Jul 2023 Yujia Xiao, Shaofei Zhang, Xi Wang, Xu Tan, Lei He, Sheng Zhao, Frank K. Soong, Tan Lee

Experiments show that ContextSpeech significantly improves the voice quality and prosody expressiveness in paragraph reading with competitive model efficiency.

Sentence

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

1 code implementation22 Sep 2022 Haohan Guo, Fenglong Xie, Frank K. Soong, Xixin Wu, Helen Meng

A vector-quantized, variational autoencoder (VQ-VAE) based feature analyzer is used to encode Mel spectrograms of speech training data by down-sampling progressively in multiple stages into MSMC Representations (MSMCRs) with different time resolutions, and quantizing them with multiple VQ codebooks, respectively.

ParaTTS: Learning Linguistic and Prosodic Cross-sentence Information in Paragraph-based TTS

no code implementations14 Sep 2022 Liumeng Xue, Frank K. Soong, Shaofei Zhang, Lei Xie

To alleviate the difficulty in training, we propose to model linguistic and prosodic information by considering cross-sentence, embedded structure in training.

Position Sentence

Neural Lexicon Reader: Reduce Pronunciation Errors in End-to-end TTS by Leveraging External Textual Knowledge

no code implementations19 Oct 2021 Mutian He, Jingzhou Yang, Lei He, Frank K. Soong

End-to-end TTS requires a large amount of speech/text paired data to cover all necessary knowledge, particularly how to pronounce different words in diverse contexts, so that a neural model may learn such knowledge accordingly.

Speech BERT Embedding For Improving Prosody in Neural TTS

no code implementations8 Jun 2021 Liping Chen, Yan Deng, Xi Wang, Frank K. Soong, Lei He

Experimental results obtained by the Transformer TTS show that the proposed BERT can extract fine-grained, segment-level prosody, which is complementary to utterance-level prosody to improve the final prosody of the TTS speech.

Multilingual Byte2Speech Models for Scalable Low-resource Speech Synthesis

2 code implementations5 Mar 2021 Mutian He, Jingzhou Yang, Lei He, Frank K. Soong

To scale neural speech synthesis to various real-world languages, we present a multilingual end-to-end framework that maps byte inputs to spectrograms, thus allowing arbitrary input scripts.

Speech Synthesis

Forward-Backward Decoding for Regularizing End-to-End TTS

1 code implementation18 Jul 2019 Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jian-Hua Tao

Experimental results show our proposed methods especially the second one (bidirectional decoder regularization), leads a significantly improvement on both robustness and overall naturalness, as outperforming baseline (the revised version of Tacotron2) with a MOS gap of 0. 14 in a challenging test, and achieving close to human quality (4. 42 vs. 4. 49 in MOS) on general test.

Exploiting Syntactic Features in a Parsed Tree to Improve End-to-End TTS

no code implementations9 Apr 2019 Haohan Guo, Frank K. Soong, Lei He, Lei Xie

The end-to-end TTS, which can predict speech directly from a given sequence of graphemes or phonemes, has shown improved performance over the conventional TTS.

Sentence

A New GAN-based End-to-End TTS Training Algorithm

no code implementations9 Apr 2019 Haohan Guo, Frank K. Soong, Lei He, Lei Xie

However, the autoregressive module training is affected by the exposure bias, or the mismatch between the different distributions of real and predicted data.

Generative Adversarial Network Sentence +1

Feature reinforcement with word embedding and parsing information in neural TTS

no code implementations3 Jan 2019 Huaiping Ming, Lei He, Haohan Guo, Frank K. Soong

In this paper, we propose a feature reinforcement method under the sequence-to-sequence neural text-to-speech (TTS) synthesis framework.

Sentence

A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding

no code implementations1 Nov 2015 Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao

Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for modeling and predicting sequential data, e. g. speech utterances or handwritten documents.

Chunking Feature Engineering +4

Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network

4 code implementations21 Oct 2015 Peilu Wang, Yao Qian, Frank K. Soong, Lei He, Hai Zhao

Bidirectional Long Short-Term Memory Recurrent Neural Network (BLSTM-RNN) has been shown to be very effective for tagging sequential data, e. g. speech utterances or handwritten documents.

Part-Of-Speech Tagging POS +1

Cannot find the paper you are looking for? You can Submit a new open access paper.