Search Results for author: Hasim Sak

Found 11 papers, 2 papers with code

Speech recognition for medical conversations

no code implementations • 20 Nov 2017 • Chung-Cheng Chiu, Anshuman Tripathi, Katherine Chou, Chris Co, Navdeep Jaitly, Diana Jaunzeikare, Anjuli Kannan, Patrick Nguyen, Hasim Sak, Ananth Sankar, Justin Tansuwan, Nathan Wan, Yonghui Wu, Xuedong Zhang

We explored both CTC and LAS systems for building speech recognition models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Neural Speech Recognizer: Acoustic-to-Word LSTM Model for Large Vocabulary Speech Recognition

no code implementations • 31 Oct 2016 • Hagen Soltau, Hank Liao, Hasim Sak

We present results that show it is possible to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with whole words as acoustic units.

Language Modelling speech-recognition +1

Paper
Add Code

Personalized Speech recognition on mobile devices

no code implementations • 10 Mar 2016 • Ian McGraw, Rohit Prabhavalkar, Raziel Alvarez, Montse Gonzalez Arenas, Kanishka Rao, David Rybach, Ouais Alsharif, Hasim Sak, Alexander Gruenstein, Francoise Beaufays, Carolina Parada

We describe a large vocabulary speech recognition system that is accurate, has low latency, and yet has a small enough memory and computational footprint to run faster than real-time on a Nexus 5 Android smartphone.

Decoder Language Modelling +2

Paper
Add Code

Large-Scale Visual Speech Recognition

no code implementations • ICLR 2019 • Brendan Shillingford, Yannis Assael, Matthew W. Hoffman, Thomas Paine, Cían Hughes, Utsav Prabhu, Hank Liao, Hasim Sak, Kanishka Rao, Lorrayne Bennett, Marie Mulville, Ben Coppin, Ben Laurie, Andrew Senior, Nando de Freitas

To achieve this, we constructed the largest existing visual speech recognition dataset, consisting of pairs of text and video clips of faces speaking (3, 886 hours of video).

Ranked #11 on Lipreading on LRS3-TED (using extra training data)

Decoder Lipreading +2

Paper
Add Code

Adversarial Training for Multilingual Acoustic Modeling

no code implementations • 17 Jun 2019 • Ke Hu, Hasim Sak, Hank Liao

In this work, we apply the domain adversarial network to encourage the shared layers of a multilingual model to learn language-invariant features.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Density Ratio Approach to Language Model Fusion in End-To-End Automatic Speech Recognition

no code implementations • 26 Feb 2020 • Erik McDermott, Hasim Sak, Ehsan Variani

The proposed approach is evaluated in cross-domain and limited-data scenarios, for which a significant amount of target domain text data is used for LM training, but only limited (or no) {audio, transcript} training data pairs are used to train the RNN-T.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Transformer Transducer: One Model Unifying Streaming and Non-streaming Speech Recognition

no code implementations • 7 Oct 2020 • Anshuman Tripathi, Jaeyoung Kim, Qian Zhang, Han Lu, Hasim Sak

In this paper we present a Transformer-Transducer model architecture and a training technique to unify streaming and non-streaming speech recognition models into one model.

speech-recognition Speech Recognition

Paper
Add Code

Reducing Streaming ASR Model Delay with Self Alignment

no code implementations • 6 May 2021 • Jaeyoung Kim, Han Lu, Anshuman Tripathi, Qian Zhang, Hasim Sak

From LibriSpeech evaluation, self alignment outperformed existing schemes: 25% and 56% less delay compared to FastEmit and constrained alignment at the similar word error rate.

Paper
Add Code

Contrastive Siamese Network for Semi-supervised Speech Recognition

no code implementations • 27 May 2022 • Soheil Khorram, Jaeyoung Kim, Anshuman Tripathi, Han Lu, Qian Zhang, Hasim Sak

This paper introduces contrastive siamese (c-siam) network, an architecture for leveraging unlabeled acoustic data in speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

Transformer Transducer: A Streamable Speech Recognition Model with Transformer Encoders and RNN-T Loss

5 code implementations • 7 Feb 2020 • Qian Zhang, Han Lu, Hasim Sak, Anshuman Tripathi, Erik McDermott, Stephen Koo, Shankar Kumar

We present results on the LibriSpeech dataset showing that limiting the left context for self-attention in the Transformer layers makes decoding computationally tractable for streaming, with only a slight degradation in accuracy.

speech-recognition Speech Recognition

Paper
Code

Turn-to-Diarize: Online Speaker Diarization Constrained by Transformer Transducer Speaker Turn Detection

1 code implementation • 23 Sep 2021 • Wei Xia, Han Lu, Quan Wang, Anshuman Tripathi, Yiling Huang, Ignacio Lopez Moreno, Hasim Sak

In this paper, we present a novel speaker diarization system for streaming on-device applications.

Clustering speaker-diarization +1

490

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.