Search Results for author: Jasha Droppo

Found 24 papers, 2 papers with code

Federated Representation Learning for Automatic Speech Recognition

no code implementations • 3 Aug 2023 • Guruprasad V Ramesh, Gopinath Chennupati, Milind Rao, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo

Federated Learning (FL) is a privacy-preserving paradigm, allowing edge devices to learn collaboratively without sharing data.

Automatic Speech Recognition Federated Learning +5

Paper
Add Code

Federated Self-Learning with Weak Supervision for Speech Recognition

no code implementations • 21 Jun 2023 • Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo

Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech

no code implementations • 4 Nov 2022 • Xin Zhang, Iván Vallés-Pérez, Andreas Stolcke, Chengzhu Yu, Jasha Droppo, Olabanji Shonibare, Roberto Barra-Chicote, Venkatesh Ravichandran

By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5. 7% relative on stuttered utterances, with only minor (<0. 2% relative) degradation for fluent utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Guided contrastive self-supervised pre-training for automatic speech recognition

no code implementations • 22 Oct 2022 • Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas

Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale

no code implementations • 19 Jul 2022 • Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure

For end-to-end automatic speech recognition (ASR) tasks, the absence of human annotated labels along with the need for privacy preserving policies for model building makes it a daunting challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Reducing Geographic Disparities in Automatic Speech Recognition via Elastic Weight Consolidation

no code implementations • 16 Jul 2022 • Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas

A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Adversarial Reweighting for Speaker Verification Fairness

no code implementations • 15 Jul 2022 • Minho Jin, Chelsea J. -T. Ju, Zeya Chen, Yi-Chieh Liu, Jasha Droppo, Andreas Stolcke

Results show that the pairwise weighting method can achieve 1. 08% overall EER, 1. 25% for male and 0. 67% for female speakers, with relative EER reductions of 7. 7%, 10. 1% and 3. 0%, respectively.

Fairness Metric Learning +1

Paper
Add Code

Improving fairness in speaker verification via Group-adapted Fusion Network

1 code implementation • 23 Feb 2022 • Hua Shen, Yuguang Yang, Guoli Sun, Ryan Langman, Eunjung Han, Jasha Droppo, Andreas Stolcke

This is observed especially with underrepresented demographic groups sharing similar voice characteristics.

Fairness Speaker Recognition +1

Paper
Code

Investigation of Training Label Error Impact on RNN-T

no code implementations • 1 Dec 2021 • I-Fan Chen, Brian King, Jasha Droppo

In this paper, we propose an approach to quantitatively analyze impacts of different training label errors to RNN-T based ASR models.

Paper
Add Code

SynthASR: Unlocking Synthetic Data for Speech Recognition

no code implementations • 14 Jun 2021 • Amin Fazel, Wei Yang, YuLan Liu, Roberto Barra-Chicote, Yixiong Meng, Roland Maas, Jasha Droppo

Our observations show that SynthASR holds great promise in training the state-of-the-art large-scale E2E ASR models for new applications while reducing the costs and dependency on production data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition

no code implementations • 14 Jun 2021 • Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris

We find that tandem training of teacher and student encoders with an inplace encoder distillation outperforms the use of a pre-trained and static teacher transducer.

Knowledge Distillation speech-recognition +1

Paper
Add Code

Scaling Laws for Acoustic Models

no code implementations • 11 Jun 2021 • Jasha Droppo, Oguz Elibol

We extend previous work to jointly predict loss due to model size, to training set size, and to the inherent "irreducible loss" of the task.

Paper
Add Code

Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows

no code implementations • 10 Jun 2021 • Iván Vallés-Pérez, Julian Roth, Grzegorz Beringer, Roberto Barra-Chicote, Jasha Droppo

This paper proposes a new neural text-to-speech model that approaches the disentanglement problem by conditioning a Tacotron2-like architecture on flow-normalized speaker embeddings, and by substituting the reference encoder with a new learned latent distribution responsible for modeling the intra-sentence variability due to the prosody.

Disentanglement Sentence

Paper
Add Code

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

no code implementations • 4 Jun 2021 • Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End

no code implementations • 14 May 2021 • Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo

On the other hand, a streaming system using per-frame intent posteriors as extra inputs for the RNN-T ASR system yields a 3. 33% relative WERR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Attention-based Neural Beamforming Layers for Multi-channel Speech Recognition

no code implementations • 12 May 2021 • Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas

The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers.

speech-recognition Speech Recognition

Paper
Add Code

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations • 9 Mar 2021 • Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Paper
Add Code

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

no code implementations • 12 Feb 2021 • Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke

Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Detection of Lexical Stress Errors in Non-Native (L2) English with Data Augmentation and Attention

no code implementations • 29 Dec 2020 • Daniel Korzekwa, Roberto Barra-Chicote, Szymon Zaporowski, Grzegorz Beringer, Jaime Lorenzo-Trueba, Alicja Serafinowicz, Jasha Droppo, Thomas Drugman, Bozena Kostek

This paper describes two novel complementary techniques that improve the detection of lexical stress errors in non-native (L2) English speech: attention-based feature extraction and data augmentation based on Neural Text-To-Speech (TTS).

Data Augmentation

Paper
Add Code

Efficient minimum word error rate training of RNN-Transducer for end-to-end speech recognition

no code implementations • 27 Jul 2020 • Jinxi Guo, Gautam Tiwari, Jasha Droppo, Maarten Van Segbroeck, Che-Wei Huang, Andreas Stolcke, Roland Maas

Unlike previous work on this topic, which performs on-the-fly limited-size beam-search decoding and generates alignment scores for expected edit-distance computation, in our proposed method, we re-calculate and sum scores of all the possible alignments for each hypothesis in N-best lists.

speech-recognition Speech Recognition

Paper
Add Code

Acoustic-To-Word Model Without OOV

no code implementations • 28 Nov 2017 • Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong

However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

Paper
Add Code

Comparing Human and Machine Errors in Conversational Speech Transcription

no code implementations • 29 Aug 2017 • Andreas Stolcke, Jasha Droppo

In this paper we approach this question by comparing the output of our most accurate CTS recognition system to that of a standard speech transcription vendor pipeline.

Paper
Add Code

Progressive Joint Modeling in Unsupervised Single-channel Overlapped Speech Recognition

no code implementations • 21 Jul 2017 • Zhehuai Chen, Jasha Droppo, Jinyu Li, Wayne Xiong

We propose to advance the current state of the art by imposing a modular structure on the neural network, applying a progressive pretraining regimen, and improving the objective function with transfer learning and a discriminative training criterion.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

On Training Bi-directional Neural Network Language Model with Noise Contrastive Estimation

1 code implementation • 19 Feb 2016 • Tianxing He, Yu Zhang, Jasha Droppo, Kai Yu

We propose to train bi-directional neural network language model(NNLM) with noise contrastive estimation(NCE).

Language Modelling

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.