Search Results for author: Jian Cong

Found 8 papers, 4 papers with code

MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation

1 code implementation31 May 2025 Yakun Song, Jiawei Chen, Xiaobin Zhuang, Chenpeng Du, Ziyang Ma, Jian Wu, Jian Cong, Dongya Jia, Zhuo Chen, Yuping Wang, Yuxuan Wang, Xie Chen

However, most existing codecs are optimized primarily for reconstruction quality, often at the expense of the downstream modelability of the encoded tokens.

Language Modeling Language Modelling

DiTAR: Diffusion Transformer Autoregressive Modeling for Speech Generation

no code implementations6 Feb 2025 Dongya Jia, Zhuo Chen, Jiawei Chen, Chenpeng Du, Jian Wu, Jian Cong, Xiaobin Zhuang, ChuMin Li, Zhen Wei, Yuping Wang, Yuxuan Wang

Several recent studies have attempted to autoregressively generate continuous speech representations without discrete speech tokens by combining diffusion and autoregressive models, yet they often face challenges with excessive computational loads or suboptimal outcomes.

Diversity Language Modeling +1

NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality

3 code implementations9 May 2022 Xu Tan, Jiawei Chen, Haohe Liu, Jian Cong, Chen Zhang, Yanqing Liu, Xi Wang, Yichong Leng, YuanHao Yi, Lei He, Frank Soong, Tao Qin, Sheng Zhao, Tie-Yan Liu

In this paper, we answer these questions by first defining the human-level quality based on the statistical significance of subjective measure and introducing appropriate guidelines to judge it, and then developing a TTS system called NaturalSpeech that achieves human-level quality on a benchmark dataset.

 Ranked #1 on Text-To-Speech Synthesis on LJSpeech (using extra training data)

Sentence Speech Synthesis +3

VISinger: Variational Inference with Adversarial Learning for End-to-End Singing Voice Synthesis

no code implementations17 Oct 2021 Yongmao Zhang, Jian Cong, Heyang Xue, Lei Xie, Pengcheng Zhu, Mengxiao Bi

In this paper, we propose VISinger, a complete end-to-end high-quality singing voice synthesis (SVS) system that directly generates audio waveform from lyrics and musical score.

Decoder Rhythm +2

Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis

no code implementations21 Jun 2021 Jian Cong, Shan Yang, Lei Xie, Dan Su

Current two-stage TTS framework typically integrates an acoustic model with a vocoder -- the acoustic model predicts a low resolution intermediate representation such as Mel-spectrum while the vocoder generates waveform from the intermediate representation.

Speech Synthesis

Controllable Context-aware Conversational Speech Synthesis

no code implementations21 Jun 2021 Jian Cong, Shan Yang, Na Hu, Guangzhi Li, Lei Xie, Dan Su

Specifically, we use explicit labels to represent two typical spontaneous behaviors filled-pause and prolongation in the acoustic model and develop a neural network based predictor to predict the occurrences of the two behaviors from text.

Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.