Search Results for author: Ye Jia

Found 25 papers, 11 papers with code

Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey

no code implementations11 Dec 2023 Haotian Zhang, Semujju Stuart Dereck, Zhicheng Wang, Xianwei Lv, Kang Xu, Liang Wu, Ye Jia, Jing Wu, Zhuo Long, Wensheng Liang, X. G. Ma, Ruiyan Zhuang

Although the applications of artificial intelligence especially deep learning had greatly improved various aspects of intelligent manufacturing, they still face challenges for wide employment due to the poor generalization ability, difficulties to establish high-quality training datasets, and unsatisfactory performance of deep learning methods.

XTREME-S: Evaluating Cross-lingual Speech Representations

no code implementations21 Mar 2022 Alexis Conneau, Ankur Bapna, Yu Zhang, Min Ma, Patrick von Platen, Anton Lozhkov, Colin Cherry, Ye Jia, Clara Rivera, Mihir Kale, Daan van Esch, Vera Axelrod, Simran Khanuja, Jonathan H. Clark, Orhan Firat, Michael Auli, Sebastian Ruder, Jason Riesa, Melvin Johnson

Covering 102 languages from 10+ language families, 3 different domains and 4 task families, XTREME-S aims to simplify multilingual speech representation evaluation, as well as catalyze research in "universal" speech representation learning.

Representation Learning Retrieval +4

mSLAM: Massively multilingual joint pre-training for speech and text

no code implementations3 Feb 2022 Ankur Bapna, Colin Cherry, Yu Zhang, Ye Jia, Melvin Johnson, Yong Cheng, Simran Khanuja, Jason Riesa, Alexis Conneau

We present mSLAM, a multilingual Speech and LAnguage Model that learns cross-lingual cross-modal representations of speech and text by pre-training jointly on large amounts of unlabeled speech and text in multiple languages.

intent-classification Intent Classification +4

Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling

6 code implementations8 Oct 2020 Jonathan Shen, Ye Jia, Mike Chrzanowski, Yu Zhang, Isaac Elias, Heiga Zen, Yonghui Wu

This paper presents Non-Attentive Tacotron based on the Tacotron 2 text-to-speech model, replacing the attention mechanism with an explicit duration predictor.

Speech Recognition

Textual Echo Cancellation

no code implementations13 Aug 2020 Shaojin Ding, Ye Jia, Ke Hu, Quan Wang

In this paper, we propose Textual Echo Cancellation (TEC) - a framework for cancelling the text-to-speech (TTS) playback echo from overlapping speech recordings.

Acoustic echo cancellation speech-recognition +1

Speech Recognition with Augmented Synthesized Speech

no code implementations25 Sep 2019 Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Ye Jia, Pedro Moreno, Yonghui Wu, Zelin Wu

Recent success of the Tacotron speech synthesis architecture and its variants in producing natural sounding multi-speaker synthesized speech has raised the exciting possibility of replacing expensive, manually transcribed, domain-specific, human speech that is used to train speech recognizers.

Data Augmentation Robust Speech Recognition +2

Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning

4 code implementations9 Jul 2019 Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Zhifeng Chen, RJ Skerry-Ryan, Ye Jia, Andrew Rosenberg, Bhuvana Ramabhadran

We present a multispeaker, multilingual text-to-speech (TTS) synthesis model based on Tacotron that is able to produce high quality speech in multiple languages.

Speech Synthesis Voice Cloning

Direct speech-to-speech translation with a sequence-to-sequence model

1 code implementation12 Apr 2019 Ye Jia, Ron J. Weiss, Fadi Biadsy, Wolfgang Macherey, Melvin Johnson, Zhifeng Chen, Yonghui Wu

We present an attention-based sequence-to-sequence neural network which can directly translate speech from one language into speech in another language, without relying on an intermediate text representation.

Speech Synthesis Speech-to-Speech Translation +3

LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech

5 code implementations5 Apr 2019 Heiga Zen, Viet Dang, Rob Clark, Yu Zhang, Ron J. Weiss, Ye Jia, Zhifeng Chen, Yonghui Wu

This paper introduces a new speech corpus called "LibriTTS" designed for text-to-speech use.

Sound Audio and Speech Processing

Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling

2 code implementations21 Feb 2019 Jonathan Shen, Patrick Nguyen, Yonghui Wu, Zhifeng Chen, Mia X. Chen, Ye Jia, Anjuli Kannan, Tara Sainath, Yuan Cao, Chung-Cheng Chiu, Yanzhang He, Jan Chorowski, Smit Hinsu, Stella Laurenzo, James Qin, Orhan Firat, Wolfgang Macherey, Suyog Gupta, Ankur Bapna, Shuyuan Zhang, Ruoming Pang, Ron J. Weiss, Rohit Prabhavalkar, Qiao Liang, Benoit Jacob, Bowen Liang, HyoukJoong Lee, Ciprian Chelba, Sébastien Jean, Bo Li, Melvin Johnson, Rohan Anil, Rajat Tibrewal, Xiaobing Liu, Akiko Eriguchi, Navdeep Jaitly, Naveen Ari, Colin Cherry, Parisa Haghani, Otavio Good, Youlong Cheng, Raziel Alvarez, Isaac Caswell, Wei-Ning Hsu, Zongheng Yang, Kuan-Chieh Wang, Ekaterina Gonina, Katrin Tomanek, Ben Vanik, Zelin Wu, Llion Jones, Mike Schuster, Yanping Huang, Dehao Chen, Kazuki Irie, George Foster, John Richardson, Klaus Macherey, Antoine Bruguier, Heiga Zen, Colin Raffel, Shankar Kumar, Kanishka Rao, David Rybach, Matthew Murray, Vijayaditya Peddinti, Maxim Krikun, Michiel A. U. Bacchiani, Thomas B. Jablin, Rob Suderman, Ian Williams, Benjamin Lee, Deepti Bhatia, Justin Carlson, Semih Yavuz, Yu Zhang, Ian McGraw, Max Galkin, Qi Ge, Golan Pundak, Chad Whipkey, Todd Wang, Uri Alon, Dmitry Lepikhin, Ye Tian, Sara Sabour, William Chan, Shubham Toshniwal, Baohua Liao, Michael Nirschl, Pat Rondon

Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence models.

Sequence-To-Sequence Speech Recognition

Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation

no code implementations5 Nov 2018 Ye Jia, Melvin Johnson, Wolfgang Macherey, Ron J. Weiss, Yuan Cao, Chung-Cheng Chiu, Naveen Ari, Stella Laurenzo, Yonghui Wu

In this paper, we demonstrate that using pre-trained MT or text-to-speech (TTS) synthesis models to convert weakly supervised data into speech-to-translation pairs for ST training can be more effective than multi-task learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Hierarchical Generative Modeling for Controllable Speech Synthesis

2 code implementations ICLR 2019 Wei-Ning Hsu, Yu Zhang, Ron J. Weiss, Heiga Zen, Yonghui Wu, Yuxuan Wang, Yuan Cao, Ye Jia, Zhifeng Chen, Jonathan Shen, Patrick Nguyen, Ruoming Pang

This paper proposes a neural sequence-to-sequence text-to-speech (TTS) model which can control latent attributes in the generated speech that are rarely annotated in the training data, such as speaking style, accent, background noise, and recording conditions.

Attribute Speech Synthesis

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

4 code implementations11 Oct 2018 Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez Moreno

In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker.

Speaker Recognition Speaker Separation +3

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

11 code implementations ICML 2018 Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Fei Ren, Ye Jia, Rif A. Saurous

In this work, we propose "global style tokens" (GSTs), a bank of embeddings that are jointly trained within Tacotron, a state-of-the-art end-to-end speech synthesis system.

Speech Synthesis Style Transfer +1

Cannot find the paper you are looking for? You can Submit a new open access paper.