Search Results for author: Hasim Sak

Found 9 papers, 1 papers with code

Reducing Streaming ASR Model Delay with Self Alignment

no code implementations6 May 2021 Jaeyoung Kim, Han Lu, Anshuman Tripathi, Qian Zhang, Hasim Sak

From LibriSpeech evaluation, self alignment outperformed existing schemes: 25% and 56% less delay compared to FastEmit and constrained alignment at the similar word error rate.

End-To-End Speech Recognition

Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition

no code implementations7 Oct 2020 Anshuman Tripathi, Jaeyoung Kim, Qian Zhang, Han Lu, Hasim Sak

In this paper we present a Transformer-Transducer model architecture and a training technique to unify streaming and non-streaming speech recognition models into one model.

Speech Recognition

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition

no code implementations26 Feb 2020 Erik McDermott, Hasim Sak, Ehsan Variani

The proposed approach is evaluated in cross-domain and limited-data scenarios, for which a significant amount of target domain text data is used for LM training, but only limited (or no) {audio, transcript} training data pairs are used to train the RNN-T.

End-To-End Speech Recognition Language Modelling +1

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

2 code implementations7 Feb 2020 Qian Zhang, Han Lu, Hasim Sak, Anshuman Tripathi, Erik McDermott, Stephen Koo, Shankar Kumar

We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy.

End-To-End Speech Recognition Speech Recognition

Adversarial Training for Multilingual Acoustic Modeling

no code implementations17 Jun 2019 Ke Hu, Hasim Sak, Hank Liao

In this work, we apply the domain adversarial network to encourage the shared layers of a multilingual model to learn language-invariant features.

Language Identification Speech Recognition

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

no code implementations31 Oct 2016 Hagen Soltau, Hank Liao, Hasim Sak

We present results that show it is possible to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with whole words as acoustic units.

Language Modelling Large Vocabulary Continuous Speech Recognition +1

Personalized Speech recognition on mobile devices

no code implementations10 Mar 2016 Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Francoise Beaufays, Carolina Parada

We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone.

Language Modelling Large Vocabulary Continuous Speech Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.