no code implementations • 27 Oct 2024 • Maohao Shen, Shun Zhang, JiLong Wu, Zhiping Xiu, Ehab AlBadawy, Yiting Lu, Mike Seltzer, Qing He
Finally, we further explore MoLE-Llama in text-in-speech-out QA tasks, demonstrating its great potential as a multimodal dialog system capable of speech generation.
no code implementations • 21 Mar 2023 • Tejas Jayashankar, JiLong Wu, Leda Sari, David Kant, Vimal Manohar, Qing He
A singing voice conversion model converts a song in the voice of an arbitrary source singer to the voice of a target singer.
no code implementations • 1 Mar 2023 • Philipp Klumpp, Pooja Chitkara, Leda Sari, Prashant Serai, JiLong Wu, Irina-Elena Veliche, Rongqing Huang, Qing He
In this work, we improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation.
no code implementations • 23 Nov 2022 • Mumin Jin, Prashant Serai, JiLong Wu, Andros Tjandra, Vimal Manohar, Qing He
Most people who have tried to learn a foreign language would have experienced difficulties understanding or speaking with a native speaker's accent.
no code implementations • 28 Oct 2022 • Jason Fong, Yun Wang, Prabhav Agrawal, Vimal Manohar, JiLong Wu, Thilo Köhler, Qing He
Text-based voice editing (TBVE) uses synthetic output from text-to-speech (TTS) systems to replace words in an original recording.
1 code implementation • 6 Dec 2021 • Ehab A. AlBadawy, Andrew Gibiansky, Qing He, JiLong Wu, Ming-Ching Chang, Siwei Lyu
We perform a subjective and objective evaluation to compare the performance of each vocoder along a different axis.
no code implementations • 1 Apr 2021 • Qing He, Zhiping Xiu, Thilo Koehler, JiLong Wu
Typical high quality text-to-speech (TTS) systems today use a two-stage architecture, with a spectrum model stage that generates spectral frames and a vocoder stage that generates the actual audio.
no code implementations • 4 Jan 2020 • ZiChao Dong, JiLong Wu, TingTing Ren, Yue Wang, MengYing Ge
One is use attention based method to focus on informative areas, while the other one aims to find high order between features.