no code implementations • 12 Dec 2023 • Jianwei Zhang, Helian Feng, Xin He, Grant P. Strimel, Farhad Ghassemi, Ali Kebarighotbi
We present a novel search optimization solution for approximate nearest neighbor (ANN) search on resource-constrained edge devices.
no code implementations • 12 May 2023 • Suhaila M. Shakiah, Rupak Vignesh Swaminathan, Hieu Duy Nguyen, Raviteja Chinta, Tariq Afzal, Nathan Susanj, Athanasios Mouchtaris, Grant P. Strimel, Ariya Rastrow
Machine learning model weights and activations are represented in full-precision during training.
no code implementations • 9 May 2023 • Xuandi Fu, Kanthashree Mysore Sathyendra, Ankur Gandhe, Jing Liu, Grant P. Strimel, Ross McGowan, Athanasios Mouchtaris
Prior approaches typically relied on subword encoders for encoding the bias phrases.
no code implementations • 7 May 2023 • Grant P. Strimel, Yi Xie, Brian King, Martin Radfar, Ariya Rastrow, Athanasios Mouchtaris
Streaming speech recognition architectures are employed for low-latency, real-time applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 3 Apr 2023 • Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree M. Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann
We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks.
no code implementations • 31 Mar 2023 • Feng-Ju Chang, Thejaswi Muniyappa, Kanthashree Mysore Sathyendra, Kai Wei, Grant P. Strimel, Ross McGowan
Specifically, it leverages dialog acts to select the most relevant user catalogs and creates queries based on both -- the audio as well as the semantic relationship between the carrier phrase and user catalogs to better guide the contextual biasing.
no code implementations • 17 Oct 2022 • Kai Zhen, Martin Radfar, Hieu Duy Nguyen, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris
For on-device automatic speech recognition (ASR), quantization aware training (QAT) is ubiquitous to achieve the trade-off between model predictive performance and efficiency.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 29 Sep 2022 • Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris
Very recently, as an alternative to LSTM layers, the Conformer architecture was introduced where the encoder of RNN-T is replaced with a modified Transformer encoder composed of convolutional layers at the frontend and between attention layers.
no code implementations • 5 Jul 2022 • Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel
We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 15 Jun 2022 • Christin Jose, Joseph Wang, Grant P. Strimel, Mohammad Omar Khursheed, Yuriy Mishchenko, Brian Kulis
We also show that when our approach is used in conjunction with a max-pooling loss, we are able to improve relative false accepts by 25 % at a fixed latency when compared to cross entropy loss.
no code implementations • 26 May 2022 • Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR) models is a challenge due to the lack of training data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 May 2022 • Kai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Muller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo
Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems.
no code implementations • 1 Apr 2022 • Xuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra
In addition, the NLU model in the two-stage system is not streamable, as it must wait for the audio segments to complete processing, which ultimately impacts the latency of the SLU system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 13 Dec 2021 • Kai Wei, Thanh Tran, Feng-Ju Chang, Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Jing Liu, Anirudh Raju, Ross McGowan, Nathan Susanj, Ariya Rastrow, Grant P. Strimel
Recent years have seen significant advances in end-to-end (E2E) spoken language understanding (SLU) systems, which directly predict intents and slots from spoken audio.
Natural Language Understanding Spoken Language Understanding
no code implementations • 3 Aug 2021 • Jonathan Macoskey, Grant P. Strimel, Jinru Su, Ariya Rastrow
We apply AmNets to the Recurrent Neural Network Transducer (RNN-T) to reduce compute cost and latency for an automatic speech recognition (ASR) task.
Ranked #55 on Speech Recognition on LibriSpeech test-clean
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 3 Aug 2021 • Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow
As more speech processing applications execute locally on edge devices, a set of resource constraints must be considered.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 3 Aug 2021 • Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow
We present Bifocal RNN-T, a new variant of the Recurrent Neural Network Transducer (RNN-T) architecture designed for improved inference time latency on speech recognition tasks.
no code implementations • 14 Jun 2021 • Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris
We find that tandem training of teacher and student encoders with an inplace encoder distillation outperforms the use of a pre-trained and static teacher transducer.
no code implementations • 6 Aug 2020 • Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P. Strimel, Athanasios Mouchtaris
We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease.
no code implementations • 19 Jul 2018 • Grant P. Strimel, Kanthashree Mysore Sathyendra, Stanislav Peshterliev
In this paper we investigate statistical model compression applied to natural language understanding (NLU) models.