Search Results for author: Khe Chai Sim

Found 29 papers, 1 papers with code

Understanding Recurrent Neural State Using Memory Signatures

no code implementations11 Feb 2018 Skanda Koppula, Khe Chai Sim, Kean Chin

We demonstrate this method's usefulness in revealing information divergence in the bases of recurrent factorized kernels, visualizing the character-level differences between the memory of n-gram and recurrent language models, and extracting knowledge of history encoded in the layers of grapheme-based end-to-end ASR networks.

Toward domain-invariant speech recognition via large scale training

no code implementations16 Aug 2018 Arun Narayanan, Ananya Misra, Khe Chai Sim, Golan Pundak, Anshuman Tripathi, Mohamed Elfeky, Parisa Haghani, Trevor Strohman, Michiel Bacchiani

More importantly, such models generalize better to unseen conditions and allow for rapid adaptation -- we show that by using as little as 10 hours of data from a new domain, an adapted domain-invariant model can match performance of a domain-specific model trained from scratch using 70 times as much data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model

no code implementations5 Dec 2017 Bo Li, Tara N. Sainath, Khe Chai Sim, Michiel Bacchiani, Eugene Weinstein, Patrick Nguyen, Zhifeng Chen, Yonghui Wu, Kanishka Rao

Sequence-to-sequence models provide a simple and elegant solution for building speech recognition systems by folding separate components of a typical system, namely acoustic (AM), pronunciation (PM) and language (LM) models into a single neural network.

speech-recognition Speech Recognition

An Investigation Into On-device Personalization of End-to-end Automatic Speech Recognition Models

no code implementations14 Sep 2019 Khe Chai Sim, Petr Zadrazil, Françoise Beaufays

Speaker-independent speech recognition systems trained with data from many users are generally robust against speaker variability and work well for a large population of speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech

no code implementations18 Jun 2021 Katrin Tomanek, Françoise Beaufays, Julie Cattiau, Angad Chandorkar, Khe Chai Sim

While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Large-scale ASR Domain Adaptation using Self- and Semi-supervised Learning

no code implementations1 Oct 2021 Dongseong Hwang, Ananya Misra, Zhouyuan Huo, Nikhil Siddhartha, Shefali Garg, David Qiu, Khe Chai Sim, Trevor Strohman, Françoise Beaufays, Yanzhang He

Self- and semi-supervised learning methods have been actively investigated to reduce labeled training data or enhance the model performance.

Domain Adaptation

Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

no code implementations5 Oct 2021 Tsendsuren Munkhdalai, Khe Chai Sim, Angad Chandorkar, Fan Gao, Mason Chua, Trevor Strohman, Françoise Beaufays

Fast contextual adaptation has shown to be effective in improving Automatic Speech Recognition (ASR) of rare words and when combined with an on-device personalized training, it can yield an even better recognition result.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Joint Unsupervised and Supervised Training for Multilingual ASR

no code implementations15 Nov 2021 Junwen Bai, Bo Li, Yu Zhang, Ankur Bapna, Nikhil Siddhartha, Khe Chai Sim, Tara N. Sainath

Our average WER of all languages outperforms average monolingual baseline by 33. 3%, and the state-of-the-art 2-stage XLSR by 32%.

Language Modelling Masked Language Modeling +3

Pseudo Label Is Better Than Human Label

no code implementations22 Mar 2022 Dongseong Hwang, Khe Chai Sim, Zhouyuan Huo, Trevor Strohman

State-of-the-art automatic speech recognition (ASR) systems are trained with tens of thousands of hours of labeled speech data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Large vocabulary speech recognition for languages of Africa: multilingual modeling and self-supervised learning

no code implementations5 Aug 2022 Sandy Ritchie, You-Chi Cheng, Mingqing Chen, Rajiv Mathews, Daan van Esch, Bo Li, Khe Chai Sim

Almost none of the 2, 000+ languages spoken in Africa have widely available automatic speech recognition systems, and the required data is also only available for a few languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Comparison of Soft and Hard Target RNN-T Distillation for Large-scale ASR

no code implementations11 Oct 2022 Dongseong Hwang, Khe Chai Sim, Yu Zhang, Trevor Strohman

Knowledge distillation is an effective machine learning technique to transfer knowledge from a teacher model to a smaller student model, especially with unlabeled data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Resource-Efficient Transfer Learning From Speech Foundation Model Using Hierarchical Feature Fusion

no code implementations4 Nov 2022 Zhouyuan Huo, Khe Chai Sim, Bo Li, Dongseong Hwang, Tara N. Sainath, Trevor Strohman

Experimental results show that the proposed method can achieve better performance on speech recognition task than existing algorithms with fewer number of trainable parameters, less computational memory cost and faster training speed.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Edit Distance based RL for RNNT decoding

no code implementations31 May 2023 Dongseong Hwang, Changwan Ryu, Khe Chai Sim

RNN-T is currently considered the industry standard in ASR due to its exceptional WERs in various benchmark tests and its ability to support seamless streaming and longform transcription.

Massive End-to-end Models for Short Search Queries

no code implementations22 Sep 2023 Weiran Wang, Rohit Prabhavalkar, Dongseong Hwang, Qiujia Li, Khe Chai Sim, Bo Li, James Qin, Xingyu Cai, Adam Stooke, Zhong Meng, CJ Zheng, Yanzhang He, Tara Sainath, Pedro Moreno Mengibar

In this work, we investigate two popular end-to-end automatic speech recognition (ASR) models, namely Connectionist Temporal Classification (CTC) and RNN-Transducer (RNN-T), for offline recognition of voice search queries, with up to 2B model parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Contextual Biasing with the Knuth-Morris-Pratt Matching Algorithm

no code implementations29 Sep 2023 Weiran Wang, Zelin Wu, Diamantino Caseiro, Tsendsuren Munkhdalai, Khe Chai Sim, Pat Rondon, Golan Pundak, Gan Song, Rohit Prabhavalkar, Zhong Meng, Ding Zhao, Tara Sainath, Pedro Moreno Mengibar

Contextual biasing refers to the problem of biasing the automatic speech recognition (ASR) systems towards rare entities that are relevant to the specific user or application scenarios.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Profit: Benchmarking Personalization and Robustness Trade-off in Federated Prompt Tuning

no code implementations6 Oct 2023 Liam Collins, Shanshan Wu, Sewoong Oh, Khe Chai Sim

In many applications of federated learning (FL), clients desire models that are personalized using their local data, yet are also robust in the sense that they retain general global knowledge.

Benchmarking Federated Learning +2

TransformerFAM: Feedback attention is working memory

no code implementations14 Apr 2024 Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, Khe Chai Sim, Pedro Moreno Mengibar

While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs.

Cannot find the paper you are looking for? You can Submit a new open access paper.