Search Results for author: Martin Radfar

Found 17 papers, 1 papers with code

Lookahead When It Matters: Adaptive Non-causal Transformers for Streaming Neural Transducers

no code implementations • 7 May 2023 • Grant P. Strimel, Yi Xie, Brian King, Martin Radfar, Ariya Rastrow, Athanasios Mouchtaris

Streaming speech recognition architectures are employed for low-latency, real-time applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

End-to-end spoken language understanding using joint CTC loss and self-supervised, pretrained acoustic encoders

no code implementations • 4 May 2023 • Jixuan Wang, Martin Radfar, Kai Wei, Clement Chung

It is challenging to extract semantic meanings directly from audio signals in spoken language understanding (SLU), due to the lack of textual information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Leveraging Redundancy in Multiple Audio Signals for Far-Field Speech Recognition

no code implementations • 1 Mar 2023 • Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, Martin Radfar, Harish Mallidi, Maurizio Omologo, Athanasios Mouchtaris, Brian King, Roland Maas

We augment the MC fusion networks to a conformer transducer model and train it in an end-to-end fashion.

Acoustic echo cancellation Automatic Speech Recognition +2

Paper
Add Code

Sub-8-bit quantization for on-device speech recognition: a regularization-free approach

no code implementations • 17 Oct 2022 • Kai Zhen, Martin Radfar, Hieu Duy Nguyen, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris

For on-device automatic speech recognition (ASR), quantization aware training (QAT) is ubiquitous to achieve the trade-off between model predictive performance and efficiency.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

ConvRNN-T: Convolutional Augmented Recurrent Neural Network Transducers for Streaming Speech Recognition

no code implementations • 29 Sep 2022 • Martin Radfar, Rohit Barnwal, Rupak Vignesh Swaminathan, Feng-Ju Chang, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris

Very recently, as an alternative to LSTM layers, the Conformer architecture was introduced where the encoder of RNN-T is replaced with a modified Transformer encoder composed of convolutional layers at the frontend and between attention layers.

speech-recognition Speech Recognition

Paper
Add Code

Compute Cost Amortized Transformer for Streaming ASR

no code implementations • 5 Jul 2022 • Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel

We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A neural prosody encoder for end-ro-end dialogue act classification

no code implementations • 11 May 2022 • Kai Wei, Dillon Knox, Martin Radfar, Thanh Tran, Markus Muller, Grant P. Strimel, Nathan Susanj, Athanasios Mouchtaris, Maurizio Omologo

Dialogue act classification (DAC) is a critical task for spoken language understanding in dialogue systems.

Dialogue Act Classification Spoken Language Understanding

Paper
Add Code

Multi-task RNN-T with Semantic Decoder for Streamable Spoken Language Understanding

no code implementations • 1 Apr 2022 • Xuandi Fu, Feng-Ju Chang, Martin Radfar, Kai Wei, Jing Liu, Grant P. Strimel, Kanthashree Mysore Sathyendra

In addition, the NLU model in the two-stage system is not streamable, as it must wait for the audio segments to complete processing, which ultimately impacts the latency of the SLU system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Context-Aware Transformer Transducer for Speech Recognition

no code implementations • 5 Nov 2021 • Feng-Ju Chang, Jing Liu, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo, Ariya Rastrow, Siegfried Kunzmann

We also leverage both BLSTM and pretrained BERT based models to encode contextual data and guide the network training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Speech Emotion Recognition Using Quaternion Convolutional Neural Networks

no code implementations • 31 Oct 2021 • Aneesh Muppidi, Martin Radfar

Specifically, the model achieves an accuracy of 77. 87\%, 70. 46\%, and 88. 78\% for the RAVDESS, IEMOCAP, and EMO-DB datasets, respectively.

Speech Emotion Recognition speech-recognition +1

Paper
Add Code

FANS: Fusing ASR and NLU for on-device SLU

no code implementations • 31 Oct 2021 • Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow

In this paper, we introduce FANS, a new end-to-end SLU model that fuses an ASR audio encoder to a multi-task NLU decoder to infer the intent, slot tags, and slot values directly from a given input audio, obviating the need for transcription.

Ranked #14 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Spoken Language Understanding

Paper
Add Code

Multi-Channel Transformer Transducer for Speech Recognition

no code implementations • 30 Aug 2021 • Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Maurizio Omologo

In this paper, we present a novel speech recognition model, Multi-Channel Transformer Transducer (MCTT), which features end-to-end multi-channel training, low computation cost, and low latency so that it is suitable for streaming decoding in on-device speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

The Performance Evaluation of Attention-Based Neural ASR under Mixed Speech Input

2 code implementations • 3 Aug 2021 • Bradley He, Martin Radfar

In this paper, we present the mixtures of speech signals to a popular attention-based neural ASR, known as Listen, Attend, and Spell (LAS), at different target-to-interference ratio (TIR) and measure the phoneme error rate.

Paper
Code

End-to-End Multi-Channel Transformer for Speech Recognition

no code implementations • 8 Feb 2021 • Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann

Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms.

speech-recognition Speech Recognition

Paper
Add Code

Encoding Syntactic Knowledge in Transformer Encoder for Intent Detection and Slot Filling

no code implementations • 21 Dec 2020 • Jixuan Wang, Kai Wei, Martin Radfar, Weiwei Zhang, Clement Chung

We propose a novel Transformer encoder-based architecture with syntactical knowledge encoded for intent detection and slot filling.

Intent Detection Multi-Task Learning +2

Paper
Add Code

Tie Your Embeddings Down: Cross-Modal Latent Spaces for End-to-end Spoken Language Understanding

no code implementations • 18 Nov 2020 • Bhuvan Agrawal, Markus Müller, Martin Radfar, Samridhi Choudhary, Athanasios Mouchtaris, Siegfried Kunzmann

In this paper, we treat an E2E system as a multi-modal model, with audio and text functioning as its two modalities, and use a cross-modal latent space (CMLS) architecture, where a shared latent space is learned between the `acoustic' and `text' embeddings.

Spoken Language Understanding

Paper
Add Code

End-to-End Neural Transformer Based Spoken Language Understanding

no code implementations • 12 Aug 2020 • Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann

In this paper, we introduce an end-to-end neural transformer-based SLU model that can predict the variable-length domain, intent, and slots vectors embedded in an audio signal with no intermediate token prediction architecture.

Spoken Language Understanding

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.