no code implementations • 7 May 2023 • Grant P. Strimel, Yi Xie, Brian King, Martin Radfar, Ariya Rastrow, Athanasios Mouchtaris
Streaming speech recognition architectures are employed for low-latency, real-time applications.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 1 Mar 2023 • Feng-Ju Chang, Anastasios Alexandridis, Rupak Vignesh Swaminathan, Martin Radfar, Harish Mallidi, Maurizio Omologo, Athanasios Mouchtaris, Brian King, Roland Maas
We augment the MC fusion networks to a conformer transducer model and train it in an end-to-end fashion.
no code implementations • 22 Dec 2022 • Avinash Prabu, Lingxi Li, Brian King, Yaobin Chen
In particular, hidden Markov models are developed for the traffic lanes and speed change of vehicles on highway.
no code implementations • 22 Jul 2022 • Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke
As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts.
no code implementations • 16 Jul 2022 • Viet Anh Trinh, Pegah Ghahremani, Brian King, Jasha Droppo, Andreas Stolcke, Roland Maas
A popular approach is to fine-tune the model with data from regions where the ASR model has a higher word error rate (WER).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 5 Jul 2022 • Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel
We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 1 Dec 2021 • I-Fan Chen, Brian King, Jasha Droppo
In this paper, we propose an approach to quantitatively analyze impacts of different training label errors to RNN-T based ASR models.
1 code implementation • 27 Oct 2021 • Brian King, Daniel R. Kowal
However, the options for count time series are limited: Gaussian DLMs require continuous data, while Poisson-based alternatives often lack sufficient modeling flexibility.
no code implementations • 14 Jun 2021 • Rupak Vignesh Swaminathan, Brian King, Grant P. Strimel, Jasha Droppo, Athanasios Mouchtaris
We find that tandem training of teacher and student encoders with an inplace encoder distillation outperforms the use of a pre-trained and static teacher transducer.
no code implementations • 4 Jun 2021 • Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas
An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 12 May 2021 • Bhargav Pulugundla, Yang Gao, Brian King, Gokce Keskin, Harish Mallidi, Minhua Wu, Jasha Droppo, Roland Maas
The end-to-end 2D Conv-Attention model is compared with a multi-head self-attention and superdirective-based neural beamformers.
no code implementations • 8 Feb 2021 • Feng-Ju Chang, Martin Radfar, Athanasios Mouchtaris, Brian King, Siegfried Kunzmann
Transformers are powerful neural architectures that allow integrating different modalities using attention mechanisms.
no code implementations • 31 Dec 2020 • Christopher Yeung, Ryan Tsai, Benjamin Pham, Brian King, Yusaku Kawagoe, David Ho, Julia Liang, Aaswath P. Raman
Understanding how nano- or micro-scale structures and material properties can be optimally configured to attain specific functionalities remains a fundamental challenge.
Optics
no code implementations • 30 Jun 2020 • Maarten Van Segbroeck, Harish Mallidih, Brian King, I-Fan Chen, Gurpreet Chadha, Roland Maas
Acoustic models in real-time speech recognition systems typically stack multiple unidirectional LSTM layers to process the acoustic frames over time.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1