no code implementations • 12 May 2021 • Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas
The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers.
no code implementations • 4 Jun 2021 • Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas
An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 1 Mar 2023 • Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, Martin Radfar, Harish Mallidi, Maurizio Omologo, Athanasios Mouchtaris, Brian King, Roland Maas
We augment the MC fusion networks to a conformer transducer model and train it in an end-to-end fashion.