Search Results for author: Erik McDermott

Found 8 papers, 1 papers with code

Focused Discriminative Training For Streaming CTC-Trained Automatic Speech Recognition Models

no code implementations23 Aug 2024 Adnan Haider, Xingyu Na, Erik McDermott, Tim Ng, Zhen Huang, Xiaodan Zhuang

This paper introduces a novel training framework called Focused Discriminative Training (FDT) to further improve streaming word-piece end-to-end (E2E) automatic speech recognition (ASR) models trained using either CTC or an interpolation of CTC and attention-based encoder-decoder (AED) loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Optimizing Byte-level Representation for End-to-end ASR

no code implementations14 Jun 2024 Roger Hsiao, Liuhui Deng, Erik McDermott, Ruchir Travadi, Xiaodan Zhuang

Byte-level representation is often used by large scale multilingual ASR systems when the character set of the supported languages is large.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Denoising LM: Pushing the Limits of Error Correction Models for Speech Recognition

no code implementations24 May 2024 Zijin Gu, Tatiana Likhomanenko, He Bai, Erik McDermott, Ronan Collobert, Navdeep Jaitly

Language models (LMs) have long been used to improve results of automatic speech recognition (ASR) systems, but they are unaware of the errors that ASR systems make.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Neural Transducer Training: Reduced Memory Consumption with Sample-wise Computation

no code implementations29 Nov 2022 Stefan Braun, Erik McDermott, Roger Hsiao

As a highlight, we manage to compute the transducer loss and gradients for a batch size of 1024, and audio length of 40 seconds, using only 6 GB of memory.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition

no code implementations26 Feb 2020 Erik McDermott, Hasim Sak, Ehsan Variani

The proposed approach is evaluated in cross-domain and limited-data scenarios, for which a significant amount of target domain text data is used for LM training, but only limited (or no) {audio, transcript} training data pairs are used to train the RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

5 code implementations7 Feb 2020 Qian Zhang, Han Lu, Hasim Sak, Anshuman Tripathi, Erik McDermott, Stephen Koo, Shankar Kumar

We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.