Search Results for author: Pedro Moreno

Found 9 papers, 0 papers with code

Maestro-U: Leveraging joint speech-text representation learning for zero supervised speech ASR

no code implementations18 Oct 2022 Zhehuai Chen, Ankur Bapna, Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Pedro Moreno, Nanxin Chen

First, we show that by combining speech representations with byte-level text representations and use of language embeddings, we can dramatically reduce the Character Error Rate (CER) on languages with no supervised speech from 64. 8\% to 30. 8\%, a relative reduction of 53\%.

Representation Learning speech-recognition +2

MAESTRO: Matched Speech Text Representations through Modality Matching

no code implementations7 Apr 2022 Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Pedro Moreno, Ankur Bapna, Heiga Zen

Self-supervised learning from speech signals aims to learn the latent structure inherent in the signal, while self-supervised learning from text attempts to capture lexical information.

Language Modelling Self-Supervised Learning +3

Ask2Mask: Guided Data Selection for Masked Speech Modeling

no code implementations24 Feb 2022 Murali Karthick Baskar, Andrew Rosenberg, Bhuvana Ramabhadran, Yu Zhang, Pedro Moreno

They treat all unsupervised speech samples with equal weight, which hinders learning as not all samples have relevant information to learn meaningful representations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Injecting Text in Self-Supervised Speech Pretraining

no code implementations27 Aug 2021 Zhehuai Chen, Yu Zhang, Andrew Rosenberg, Bhuvana Ramabhadran, Gary Wang, Pedro Moreno

The proposed method, tts4pretrain complements the power of contrastive learning in self-supervision with linguistic/lexical representations derived from synthesized speech, effectively learning from untranscribed speech and unspoken text.

Contrastive Learning Language Modelling +2

Speech Recognition with Augmented Synthesized Speech

no code implementations25 Sep 2019 Andrew Rosenberg, Yu Zhang, Bhuvana Ramabhadran, Ye Jia, Pedro Moreno, Yonghui Wu, Zelin Wu

Recent success of the Tacotron speech synthesis architecture and its variants in producing natural sounding multi-speaker synthesized speech has raised the exciting possibility of replacing expensive, manually transcribed, domain-specific, human speech that is used to train speech recognizers.

Data Augmentation Diversity +3

From Audio to Semantics: Approaches to end-to-end spoken language understanding

no code implementations24 Sep 2018 Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro Moreno, Rohit Prabhavalkar, Zhongdi Qu, Austin Waters

Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Multilingual Speech Recognition With A Single End-To-End Model

no code implementations6 Nov 2017 Shubham Toshniwal, Tara N. Sainath, Ron J. Weiss, Bo Li, Pedro Moreno, Eugene Weinstein, Kanishka Rao

Training a conventional automatic speech recognition (ASR) system to support multiple languages is challenging because the sub-word unit, lexicon and word inventories are typically language specific.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.