Search Results for author: Kainan Peng

Found 12 papers, 6 papers with code

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

no code implementations10 Apr 2024 Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre.

Attribute

Zero-Shot Accent Conversion using Pseudo Siamese Disentanglement Network

no code implementations12 Dec 2022 Dongya Jia, Qiao Tian, Kainan Peng, Jiaxin Li, Yuanzhe Chen, Mingbo Ma, Yuping Wang, Yuxuan Wang

The goal of accent conversion (AC) is to convert the accent of speech into the target accent while preserving the content and speaker identity.

Data Augmentation Disentanglement

Parallel Neural Text-to-Speech

no code implementations ICLR 2020 Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

In this work, we first propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.

Text to Speech

WaveFlow: A Compact Flow-based Model for Raw Audio

4 code implementations ICML 2020 Wei Ping, Kainan Peng, Kexin Zhao, Zhao Song

WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases.

Speech Synthesis

Multi-Speaker End-to-End Speech Synthesis

no code implementations9 Jul 2019 Jihyun Park, Kexin Zhao, Kainan Peng, Wei Ping

In this work, we extend ClariNet (Ping et al., 2019), a fully end-to-end speech synthesis model (i. e., text-to-wave), to generate high-fidelity speech from multiple speakers.

Speech Synthesis

Non-Autoregressive Neural Text-to-Speech

2 code implementations ICML 2020 Kainan Peng, Wei Ping, Zhao Song, Kexin Zhao

In this work, we propose ParaNet, a non-autoregressive seq2seq model that converts text to spectrogram.

Text to Speech Text-To-Speech Synthesis

Neural Voice Cloning with a Few Samples

2 code implementations NeurIPS 2018 Sercan O. Arik, Jitong Chen, Kainan Peng, Wei Ping, Yanqi Zhou

Speaker adaptation is based on fine-tuning a multi-speaker generative model with a few cloning samples.

Speech Synthesis Voice Cloning

HybridNet: A Hybrid Neural Architecture to Speed-up Autoregressive Models

no code implementations ICLR 2018 Yanqi Zhou, Wei Ping, Sercan Arik, Kainan Peng, Greg Diamos

This paper introduces HybridNet, a hybrid neural network to speed-up autoregressive models for raw audio waveform generation.

Speech Synthesis Text to Speech

Deep Voice 2: Multi-Speaker Neural Text-to-Speech

1 code implementation NeurIPS 2017 Sercan Arik, Gregory Diamos, Andrew Gibiansky, John Miller, Kainan Peng, Wei Ping, Jonathan Raiman, Yanqi Zhou

We introduce Deep Voice 2, which is based on a similar pipeline with Deep Voice 1, but constructed with higher performance building blocks and demonstrates a significant audio quality improvement over Deep Voice 1.

Speech Synthesis Text to Speech

Cannot find the paper you are looking for? You can Submit a new open access paper.