no code implementations • 14 Sep 2022 • Tom O'Malley, Arun Narayanan, Quan Wang
The joint model uses contextual information, such as a reference of the playback audio, noise context, and speaker embedding.
no code implementations • 25 Apr 2022 • Joseph Caroselli, Arun Narayanan, Nathan Howard, Tom O'Malley
This work introduces the Cleanformer, a streaming multichannel neural based enhancement frontend for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 8 Apr 2022 • Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw
Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers.
no code implementations • 18 Nov 2021 • Tom O'Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker, Nathan Howard
Compared to the noisy baseline, the joint model reduces the word error rate in low signal-to-noise ratio conditions by at least 71% on our echo cancellation dataset, 10% on our noisy dataset, and 26% on our multi-speaker dataset.
no code implementations • 30 Oct 2021 • Arun Narayanan, Chung-Cheng Chiu, Tom O'Malley, Quan Wang, Yanzhang He
This work introduces \emph{cross-attention conformer}, an attention-based architecture for context modeling in speech enhancement.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2