Search Results for author: Athanasios Mouchtaris

Found 19 papers, 0 papers with code

Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

no code implementations17 Oct 2022 Kai Zhen, Martin Radfar, Hieu Duy Nguyen, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris

For on-device automatic speech recognition (ASR), quantization aware training (QAT) is ubiquitous to achieve the trade-off between model predictive performance and efficiency.

Automatic Speech Recognition Quantization +1

ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition

no code implementations29 Sep 2022 Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris

Very recently, as an alternative to LSTM layers, the Conformer architecture was introduced where the encoder of RNN-T is replaced with a modified Transformer encoder composed of convolutional layers at the frontend and between attention layers.

speech-recognition Speech Recognition

Compute Cost Amortized Transformer for Streaming ASR

no code implementations5 Jul 2022 Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel

We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization.

Automatic Speech Recognition speech-recognition

FANS: Fusing ASR and NLU for on-device SLU

no code implementations31 Oct 2021 Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow

In this paper, we introduce FANS, a new end-to-end SLU model that fuses an ASR audio encoder to a multi-task NLU decoder to infer the intent, slot tags, and slot values directly from a given input audio, obviating the need for transcription.

Ranked #13 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Spoken Language Understanding

Multi-Channel Transformer Transducer for Speech Recognition

no code implementations30 Aug 2021 Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo

In this paper, we present a novel speech recognition model, Multi-Channel Transformer Transducer (MCTT), which features end-to-end multi-channel training, low computation cost, and low latency so that it is suitable for streaming decoding in on-device speech recognition.

speech-recognition Speech Recognition

End-to-End Spoken Language Understanding for Generalized Voice Assistants

no code implementations16 Jun 2021 Michael Saxon, Samridhi Choudhary, Joseph P. McKenna, Athanasios Mouchtaris

End-to-end (E2E) spoken language understanding (SLU) systems predict utterance semantics directly from speech using a single model.

Ranked #9 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Spoken Language Understanding

CoDERT: Distilling Encoder Representations with Co-learning for Transducer-based Speech Recognition

no code implementations14 Jun 2021 Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris

We find that tandem training of teacher and student encoders with an inplace encoder distillation outperforms the use of a pre-trained and static teacher transducer.

Knowledge Distillation speech-recognition +1

Exploiting Large-scale Teacher-Student Training for On-device Acoustic Models

no code implementations11 Jun 2021 Jing Liu, Rupak Vignesh Swaminathan, Sree Hari Krishnan Parthasarathi, Chunchuan Lyu, Athanasios Mouchtaris, Siegfried Kunzmann

We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 hours of GPU time, making our study one of the largest of its kind.

End-to-End Multi-Channel Transformer for Speech Recognition

no code implementations8 Feb 2021 Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann

Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms.

speech-recognition Speech Recognition

Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end Spoken Language Understanding

no code implementations18 Nov 2020 Bhuvan Agrawal, Markus Müller, Martin Radfar, Samridhi Choudhary, Athanasios Mouchtaris, Siegfried Kunzmann

In this paper, we treat an E2E system as a multi-modal model, with audio and text functioning as its two modalities, and use a cross-modal latent space (CMLS) architecture, where a shared latent space is learned between the `acoustic' and `text' embeddings.

Spoken Language Understanding

End-to-End Neural Transformer Based Spoken Language Understanding

no code implementations12 Aug 2020 Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann

In this paper, we introduce an end-to-end neural transformer-based SLU model that can predict the variable-length domain, intent, and slots vectors embedded in an audio signal with no intermediate token prediction architecture.

Spoken Language Understanding

Semantic Complexity in End-to-End Spoken Language Understanding

no code implementations6 Aug 2020 Joseph P. McKenna, Samridhi Choudhary, Michael Saxon, Grant P. Strimel, Athanasios Mouchtaris

We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease.

Spoken Language Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.