Search Results for author: Siegfried Kunzmann

Found 10 papers, 0 papers with code

Quantization-Aware and Tensor-Compressed Training of Transformers for Natural Language Understanding

no code implementations • 1 Jun 2023 • Zi Yang, Samridhi Choudhary, Siegfried Kunzmann, Zheng Zhang

To improve the convergence, a layer-by-layer distillation is applied to distill a quantized and tensor-compressed student model from a pre-trained transformer.

Natural Language Understanding Quantization

Paper
Add Code

Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition

no code implementations • 3 Apr 2023 • Saumya Y. Sahai, Jing Liu, Thejaswi Muniyappa, Kanthashree M. Sathyendra, Anastasios Alexandridis, Grant P. Strimel, Ross McGowan, Ariya Rastrow, Feng-Ju Chang, Athanasios Mouchtaris, Siegfried Kunzmann

We present dual-attention neural biasing, an architecture designed to boost Wake Words (WW) recognition and improve inference time latency on speech recognition tasks.

speech-recognition Speech Recognition

Paper
Add Code

Contextual Adapters for Personalized Speech Recognition in Neural Transducers

no code implementations • 26 May 2022 • Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Feng-Ju Chang, Jing Liu, Jinru Su, Grant P. Strimel, Athanasios Mouchtaris, Siegfried Kunzmann

Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR) models is a challenge due to the lack of training data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Context-Aware Transformer Transducer for Speech Recognition

no code implementations • 5 Nov 2021 • Feng-Ju Chang, Jing Liu, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo, Ariya Rastrow, Siegfried Kunzmann

We also leverage both BLSTM and pretrained BERT based models to encode contextual data and guide the network training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

FANS: Fusing ASR and NLU for on-device SLU

no code implementations • 31 Oct 2021 • Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow

In this paper, we introduce FANS, a new end-to-end SLU model that fuses an ASR audio encoder to a multi-task NLU decoder to infer the intent, slot tags, and slot values directly from a given input audio, obviating the need for transcription.

Ranked #14 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Spoken Language Understanding

Paper
Add Code

Exploiting Large-scale Teacher-Student Training for On-device Acoustic Models

no code implementations • 11 Jun 2021 • Jing Liu, Rupak Vignesh Swaminathan, Sree Hari Krishnan Parthasarathi, Chunchuan Lyu, Athanasios Mouchtaris, Siegfried Kunzmann

We present results from Alexa speech teams on semi-supervised learning (SSL) of acoustic models (AM) with experiments spanning over 3000 hours of GPU time, making our study one of the largest of its kind.

Paper
Add Code

End-to-End Multi-Channel Transformer for Speech Recognition

no code implementations • 8 Feb 2021 • Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann

Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms.

speech-recognition Speech Recognition

Paper
Add Code

Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end Spoken Language Understanding

no code implementations • 18 Nov 2020 • Bhuvan Agrawal, Markus Müller, Martin Radfar, Samridhi Choudhary, Athanasios Mouchtaris, Siegfried Kunzmann

In this paper, we treat an E2E system as a multi-modal model, with audio and text functioning as its two modalities, and use a cross-modal latent space (CMLS) architecture, where a shared latent space is learned between the `acoustic' and `text' embeddings.

Spoken Language Understanding

Paper
Add Code

End-to-End Neural Transformer Based Spoken Language Understanding

no code implementations • 12 Aug 2020 • Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann

In this paper, we introduce an end-to-end neural transformer-based SLU model that can predict the variable-length domain, intent, and slots vectors embedded in an audio signal with no intermediate token prediction architecture.

Spoken Language Understanding

Paper
Add Code

Streaming End-to-End Bilingual ASR Systems with Joint Language Identification

no code implementations • 8 Jul 2020 • Surabhi Punjabi, Harish Arsikere, Zeynab Raeesy, Chander Chandak, Nikhil Bhave, Ankish Bansal, Markus Müller, Sergio Murillo, Ariya Rastrow, Sri Garimella, Roland Maas, Mat Hans, Athanasios Mouchtaris, Siegfried Kunzmann

Experiments show that for English-Spanish, the bilingual joint ASR-LID architecture matches monolingual ASR and acoustic-only LID accuracies.

Language Identification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.