no code implementations • 10 Apr 2024 • Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma
We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre.
no code implementations • 12 Dec 2022 • Dongya Jia, Qiao Tian, Kainan Peng, Jiaxin Li, Yuanzhe Chen, Mingbo Ma, Yuping Wang, Yuxuan Wang
The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity.
no code implementations • ICLR 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
In this work, we first propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.
4 code implementations • ICML 2020 • Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song
WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases.
Ranked #13 on Speech Synthesis on LibriTTS
no code implementations • Findings of the Association for Computational Linguistics 2020 • Mingbo Ma, Baigong Zheng, Kaibo Liu, Renjie Zheng, Hairong Liu, Kainan Peng, Kenneth Church, Liang Huang
Text-to-speech synthesis (TTS) has witnessed rapid progress in recent years, where neural methods became capable of producing audios with high naturalness.
no code implementations • 9 Jul 2019 • Jihyun Park, Kexin Zhao, Kainan Peng, Wei Ping
In this work, we extend ClariNet (Ping et al., 2019), a fully end-to-end speech synthesis model (i. e., text-to-wave), to generate high-fidelity speech from multiple speakers.
2 code implementations • ICML 2020 • Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao
In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.
5 code implementations • ICLR 2019 • Wei Ping, Kainan Peng, Jitong Chen
In this work, we propose a new solution for parallel wave generation by WaveNet.
2 code implementations • NeurIPS 2018 • Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou
Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples.
no code implementations • ICLR 2018 • Yanqi Zhou, Wei Ping, Sercan Arik, Kainan Peng, Greg Diamos
This paper introduces HybridNet, a hybrid neural network to speed-up autoregressive models for raw audio waveform generation.
7 code implementations • ICLR 2018 • Wei Ping, Kainan Peng, Andrew Gibiansky, Sercan O. Arik, Ajay Kannan, Sharan Narang, Jonathan Raiman, John Miller
We present Deep Voice 3, a fully-convolutional attention-based neural text-to-speech (TTS) system.
1 code implementation • NeurIPS 2017 • Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou
We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1.