1 code implementation • 19 Oct 2021 • Mutian He, Jingzhou Yang, Lei He, Frank K. Soong
End-to-end TTS requires a large amount of speech/text paired data to cover all necessary knowledge, particularly how to pronounce different words in diverse contexts, so that a neural model may learn such knowledge accordingly.
2 code implementations • 5 Mar 2021 • Mutian He, Jingzhou Yang, Lei He, Frank K. Soong
To scale neural speech synthesis to various real-world languages, we present a multilingual end-to-end framework that maps byte inputs to spectrograms, thus allowing arbitrary input scripts.