Search Results for author: Nanxin Chen

Found 14 papers, 7 papers with code

A Comparative Study on Non-Autoregressive Modelings for Speech-to-Text Generation

no code implementations11 Oct 2021 Yosuke Higuchi, Nanxin Chen, Yuya Fujita, Hirofumi Inaguma, Tatsuya Komatsu, Jaesong Lee, Jumon Nozaki, Tianzi Wang, Shinji Watanabe

Non-autoregressive (NAR) models simultaneously generate multiple outputs in a sequence, which significantly reduces the inference speed at the cost of accuracy drop compared to autoregressive baselines.

automatic-speech-recognition Speech Recognition +2

WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis

2 code implementations17 Jun 2021 Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, Najim Dehak, William Chan

The model takes an input phoneme sequence, and through an iterative refinement process, generates an audio waveform.

Speech Synthesis Text-To-Speech Synthesis

Focus on the present: a regularization method for the ASR source-target attention layer

no code implementations2 Nov 2020 Nanxin Chen, Piotr Żelasko, Jesús Villalba, Najim Dehak

This paper introduces a novel method to diagnose the source-target attention in state-of-the-art end-to-end speech recognition models with joint connectionist temporal classification (CTC) and attention training.

End-To-End Speech Recognition Speech Recognition

WaveGrad: Estimating Gradients for Waveform Generation

7 code implementations ICLR 2021 Nanxin Chen, Yu Zhang, Heiga Zen, Ron J. Weiss, Mohammad Norouzi, William Chan

This paper introduces WaveGrad, a conditional model for waveform generation which estimates gradients of the data density.

Speech Synthesis Text-To-Speech Synthesis

Mask CTC: Non-Autoregressive End-to-End ASR with CTC and Mask Predict

no code implementations18 May 2020 Yosuke Higuchi, Shinji Watanabe, Nanxin Chen, Tetsuji Ogawa, Tetsunori Kobayashi

In this work, Mask CTC model is trained using a Transformer encoder-decoder with joint training of mask prediction and CTC.

Audio and Speech Processing Sound

Improving Language Identification for Multilingual Speakers

no code implementations29 Jan 2020 Andrew Titus, Jan Silovsky, Nanxin Chen, Roger Hsiao, Mary Young, Arnab Ghoshal

Spoken language identification (LID) technologies have improved in recent years from discriminating largely distinct languages to discriminating highly similar languages or even dialects of the same language.

Language Identification Spoken language identification

Listen and Fill in the Missing Letters: Non-Autoregressive Transformer for Speech Recognition

no code implementations10 Nov 2019 Nanxin Chen, Shinji Watanabe, Jesús Villalba, Najim Dehak

In this paper, we study two different non-autoregressive transformer structure for automatic speech recognition (ASR): A-CMLM and A-FMLM.

automatic-speech-recognition Machine Translation +1

Zero-Shot Multi-Speaker Text-To-Speech with State-of-the-art Neural Speaker Embeddings

3 code implementations23 Oct 2019 Erica Cooper, Cheng-I Lai, Yusuke Yasuda, Fuming Fang, Xin Wang, Nanxin Chen, Junichi Yamagishi

While speaker adaptation for end-to-end speech synthesis using speaker embeddings can produce good speaker similarity for speakers seen during training, there remains a gap for zero-shot adaptation to unseen speakers.

Audio and Speech Processing

ASSERT: Anti-Spoofing with Squeeze-Excitation and Residual neTworks

1 code implementation1 Apr 2019 Cheng-I Lai, Nanxin Chen, Jesús Villalba, Najim Dehak

We present JHU's system submission to the ASVspoof 2019 Challenge: Anti-Spoofing with Squeeze-Excitation and Residual neTworks (ASSERT).

Feature Engineering Voice Conversion

Cannot find the paper you are looking for? You can Submit a new open access paper.