Search Results for author: Jasha Droppo

Found 22 papers, 2 papers with code

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

no code implementations4 Nov 2022 Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran

By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5. 7% relative on stuttered utterances, with only minor (<0. 2% relative) degradation for fluent utterances.

Automatic Speech Recognition speech-recognition

Guided contrastive self-supervised pre-training for automatic speech recognition

no code implementations22 Oct 2022 Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas

Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model.

Automatic Speech Recognition Representation Learning +1

Adversarial Reweighting for Speaker Verification Fairness

no code implementations15 Jul 2022 Minho Jin, Chelsea J. -T. Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke

Results show that the pairwise weighting method can achieve 1. 08% overall EER, 1. 25% for male and 0. 67% for female speakers, with relative EER reductions of 7. 7%, 10. 1% and 3. 0%, respectively.

Fairness Metric Learning +1

Investigation of Training Label Error Impact on RNN-T

no code implementations1 Dec 2021 I-Fan Chen, Brian King, Jasha Droppo

In this paper, we propose an approach to quantitatively analyze impacts of different training label errors to RNN-T based ASR models.

CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition

no code implementations14 Jun 2021 Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris

We find that tandem training of teacher and student encoders with an inplace encoder distillation outperforms the use of a pre-trained and static teacher transducer.

Knowledge Distillation speech-recognition +1

SynthASR: Unlocking Synthetic Data for Speech Recognition

no code implementations14 Jun 2021 Amin Fazel, Wei Yang, YuLan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo

Our observations show that SynthASR holds great promise in training the state-of-the-art large-scale E2E ASR models for new applications while reducing the costs and dependency on production data.

Automatic Speech Recognition Continual Learning +2

Scaling Laws for Acoustic Models

no code implementations11 Jun 2021 Jasha Droppo, Oguz Elibol

We extend previous work to jointly predict loss due to model size, to training set size, and to the inherent "irreducible loss" of the task.

Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows

no code implementations10 Jun 2021 Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo

This paper proposes a new neural text-to-speech model that approaches the disentanglement problem by conditioning a Tacotron2-like architecture on flow-normalized speaker embeddings, and by substituting the reference encoder with a new learned latent distribution responsible for modeling the intra-sentence variability due to the prosody.

Disentanglement

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

no code implementations4 Jun 2021 Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.

Automatic Speech Recognition speech-recognition

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations9 Mar 2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

no code implementations12 Feb 2021 Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke

Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems.

Automatic Speech Recognition Natural Language Understanding +2

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

no code implementations29 Dec 2020 Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS).

Data Augmentation

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

no code implementations27 Jul 2020 Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas

Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.

speech-recognition Speech Recognition

Acoustic-To-Word Model Without OOV

no code implementations28 Nov 2017 Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong

However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

Comparing Human and Machine Errors in Conversational Speech Transcription

no code implementations29 Aug 2017 Andreas Stolcke, Jasha Droppo

In this paper we approach this question by comparing the output of our most accurate CTS recognition system to that of a standard speech transcription vendor pipeline.

Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition

no code implementations21 Jul 2017 Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong

We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion.

Automatic Speech Recognition speech-recognition +2

On Training Bi-directional Neural Network Language Model with Noise Contrastive Estimation

1 code implementation19 Feb 2016 Tianxing He, Yu Zhang, Jasha Droppo, Kai Yu

We propose to train bi-directional neural network language model(NNLM) with noise contrastive estimation(NCE).

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.