1 code implementation • 26 Jun 2024 • Yifan Yang, Kai Zhen, Ershad Banijamal, Athanasios Mouchtaris, Zheng Zhang
In this paper, we propose the Adaptive Zeroth-order Tensor-Train Adaption (AdaZeta) framework, specifically designed to improve the performance and convergence of the ZO methods.
no code implementations • 12 May 2023 • Suhaila M. Shakiah, Rupak Vignesh Swaminathan, Hieu Duy Nguyen, Raviteja Chinta, Tariq Afzal, Nathan Susanj, Athanasios Mouchtaris, Grant P. Strimel, Ariya Rastrow
Machine learning model weights and activations are represented in full-precision during training.
no code implementations • 9 May 2023 • Xuandi Fu, Kanthashree Mysore Sathyendra, Ankur Gandhe, Jing Liu, Grant P. Strimel, Ross McGowan, Athanasios Mouchtaris
Prior approaches typically relied on subword encoders for encoding the bias phrases.
no code implementations • 7 May 2023 • Grant P. Strimel, Yi Xie, Brian King, Martin Radfar, Ariya Rastrow, Athanasios Mouchtaris
Streaming speech recognition architectures are employed for low-latency, real-time applications.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 3 Apr 2023 • Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree M. Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann
We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks.
no code implementations • 1 Mar 2023 • Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, Martin Radfar, Harish Mallidi, Maurizio Omologo, Athanasios Mouchtaris, Brian King, Roland Maas
We augment the MC fusion networks to a conformer transducer model and train it in an end-to-end fashion.
no code implementations • 17 Oct 2022 • Kai Zhen, Martin Radfar, Hieu Duy Nguyen, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris
For on-device automatic speech recognition (ASR), quantization aware training (QAT) is ubiquitous to achieve the trade-off between model predictive performance and efficiency.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 29 Sep 2022 • Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris
Very recently, as an alternative to LSTM layers, the Conformer architecture was introduced where the encoder of RNN-T is replaced with a modified Transformer encoder composed of convolutional layers at the frontend and between attention layers.
no code implementations • 5 Jul 2022 • Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel
We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 30 Jun 2022 • Kai Zhen, Hieu Duy Nguyen, Raviteja Chinta, Nathan Susanj, Athanasios Mouchtaris, Tariq Afzal, Ariya Rastrow
We present a novel sub-8-bit quantization-aware training (S8BQAT) scheme for 8-bit neural network accelerators.
no code implementations • 26 May 2022 • Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann
Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR) models is a challenge due to the lack of training data.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 May 2022 • Kai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Muller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo
Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems.
no code implementations • 5 Nov 2021 • Feng-Ju Chang, Jing Liu, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo, Ariya Rastrow, Siegfried Kunzmann
We also leverage both BLSTM and pretrained BERT based models to encode contextual data and guide the network training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 31 Oct 2021 • Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow
In this paper, we introduce FANS, a new end-to-end SLU model that fuses an ASR audio encoder to a multi-task NLU decoder to infer the intent, slot tags, and slot values directly from a given input audio, obviating the need for transcription.
Ranked #14 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)
no code implementations • 30 Aug 2021 • Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo
In this paper, we present a novel speech recognition model, Multi-Channel Transformer Transducer (MCTT), which features end-to-end multi-channel training, low computation cost, and low latency so that it is suitable for streaming decoding in on-device speech recognition.
no code implementations • 16 Jun 2021 • Michael Saxon, Samridhi Choudhary, Joseph P. McKenna, Athanasios Mouchtaris
End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model.
Ranked #10 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)
no code implementations • 14 Jun 2021 • Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris
We find that tandem training of teacher and student encoders with an inplace encoder distillation outperforms the use of a pre-trained and static teacher transducer.
no code implementations • 11 Jun 2021 • Jing Liu, Rupak Vignesh Swaminathan, Sree Hari Krishnan Parthasarathi, Chunchuan Lyu, Athanasios Mouchtaris, Siegfried Kunzmann
We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 hours of GPU time, making our study one of the largest of its kind.
no code implementations • 9 Feb 2021 • Kai Zhen, Hieu Duy Nguyen, Feng-Ju Chang, Athanasios Mouchtaris, Ariya Rastrow, .
In the literature, such methods are referred to as sparse pruning.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 8 Feb 2021 • Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann
Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms.
no code implementations • 18 Nov 2020 • Bhuvan Agrawal, Markus Müller, Martin Radfar, Samridhi Choudhary, Athanasios Mouchtaris, Siegfried Kunzmann
In this paper, we treat an E2E system as a multi-modal model, with audio and text functioning as its two modalities, and use a cross-modal latent space (CMLS) architecture, where a shared latent space is learned between the `acoustic' and `text' embeddings.
no code implementations • 12 Aug 2020 • Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann
In this paper, we introduce an end-to-end neural transformer-based SLU model that can predict the variable-length domain, intent, and slots vectors embedded in an audio signal with no intermediate token prediction architecture.
no code implementations • 6 Aug 2020 • Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P. Strimel, Athanasios Mouchtaris
We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease.
no code implementations • 8 Jul 2020 • Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann
Experiments show that for English-Spanish, the bilingual joint ASR-LID architecture matches monolingual ASR and acoustic-only LID accuracies.
no code implementations • 26 Sep 2018 • Michalis Giannopoulos, Grigorios Tsagkatakis, Saverio Blasi, Farzad Toutounchi, Athanasios Mouchtaris, Panagiotis Tsakalides, Marta Mrak, Ebroul Izquierdo
Moreover, transmission also introduces delays and other distortion types which affect the perceived quality.