no code implementations • 17 Jan 2025 • Cassia Nunes Almeida, Arun Narayanan, Hafiz Majid Hussain, Pedro H. J. Nardelli
In this paper, we compare pricing and non-pricing mechanisms for implementing demand-side management (DSM) mechanisms in a neighborhood in Helsinki, Finland.
no code implementations • 9 Oct 2024 • Hongbin Liu, Youzheng Chen, Arun Narayanan, Athula Balachandran, Pedro J. Moreno, Lun Wang
Recent advances in text-to-speech (TTS) systems, particularly those with voice cloning capabilities, have made voice impersonation readily accessible, raising ethical and legal concerns due to potential misuse for malicious activities like misinformation campaigns and fraud.
no code implementations • 21 Sep 2024 • Geeticka Chauhan, Steve Chien, Om Thakkar, Abhradeep Thakurta, Arun Narayanan
In this paper, we apply differentially private (DP) pre-training to a SOTA Conformer-based encoder, and study its performance on a downstream ASR task assuming the fine-tuning data is public.
no code implementations • 4 Jun 2024 • Lun Wang, Om Thakkar, Zhong Meng, Nicole Rafidi, Rohit Prabhavalkar, Arun Narayanan
Gradient clipping plays a vital role in training large-scale automatic speech recognition (ASR) models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 27 Feb 2024 • Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno
In the present work, we study one such strategy: applying multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 14 Sep 2022 • Tom O'Malley, Arun Narayanan, Quan Wang
The joint model uses contextual information, such as a reference of the playback audio, noise context, and speaker embedding.
no code implementations • 17 May 2022 • Joe Caroselli, Arun Narayanan, Yiteng Huang
First is the Context Aware Beamformer which uses the noise context and detected hotword to determine how to target the desired speaker.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 6 May 2022 • Sankaran Panchapagesan, Arun Narayanan, Turaj Zakizadeh Shabestary, Shuai Shao, Nathan Howard, Alex Park, James Walker, Alexander Gruenstein
Acoustic Echo Cancellation (AEC) is essential for accurate recognition of queries spoken to a smart speaker that is playing out audio.
no code implementations • 26 Apr 2022 • Arun Narayanan, James Walker, Sankaran Panchapagesan, Nathan Howard, Yuma Koizumi
Using neural network based acoustic frontends for improving robustness of streaming automatic speech recognition (ASR) systems is challenging because of the causality constraints and the resulting distortion that the frontend processing introduces in speech.
no code implementations • 25 Apr 2022 • Joseph Caroselli, Arun Narayanan, Nathan Howard, Tom O'Malley
This work introduces the Cleanformer, a streaming multichannel neural based enhancement frontend for automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 18 Apr 2022 • Ehsan Amid, Om Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays
We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data from trained ASR models.
no code implementations • 8 Apr 2022 • Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw
Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers.
no code implementations • 18 Nov 2021 • Tom O'Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker, Nathan Howard
Compared to the noisy baseline, the joint model reduces the word error rate in low signal-to-noise ratio conditions by at least 71% on our echo cancellation dataset, 10% on our noisy dataset, and 26% on our multi-speaker dataset.
no code implementations • 1 Nov 2021 • Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani
Furthermore, by analyzing the predicted target SNRi, we observed the jointly trained network automatically controls the target SNRi according to noise characteristics.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 30 Oct 2021 • Arun Narayanan, Chung-Cheng Chiu, Tom O'Malley, Quan Wang, Yanzhang He
This work introduces \emph{cross-attention conformer}, an attention-based architecture for context modeling in speech enhancement.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 19 Aug 2021 • Hafiz Majid Hussain, Ashfaq Ahmad, Arun Narayanan, Pedro H. J. Nardelli, Yongheng Yang
In this paper, we investigate the management of energy storage control and load scheduling in scenarios considering a grid-connected photovoltaic (PV) system using packetized energy management.
no code implementations • 28 Apr 2021 • Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng, Huang, Arun Narayanan, Ian McGraw
In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 1 Feb 2021 • Pedro H. J. Nardelli, Hafiz Majid Hussein, Arun Narayanan, Yongheng Yang
Digitalization has led to radical changes in the distribution of goods across various sectors.
no code implementations • 12 Dec 2020 • Rohit Prabhavalkar, Yanzhang He, David Rybach, Sean Campbell, Arun Narayanan, Trevor Strohman, Tara N. Sainath
End-to-end models that condition the output label sequence on all previously predicted labels have emerged as popular alternatives to conventional systems for automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 21 Nov 2020 • Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu
To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR.
Audio and Speech Processing Sound
no code implementations • 27 Oct 2020 • Arun Narayanan, Tara N. Sainath, Ruoming Pang, Jiahui Yu, Chung-Cheng Chiu, Rohit Prabhavalkar, Ehsan Variani, Trevor Strohman
The proposed model consists of streaming and non-streaming encoders.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 22 Oct 2020 • Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao
We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 21 Oct 2020 • Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang
FastEmit also improves streaming ASR accuracy from 4. 4%/8. 9% to 3. 1%/7. 5% WER, meanwhile reduces 90th percentile latency from 210 ms to only 30 ms on LibriSpeech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 7 May 2020 • Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu
On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 28 Mar 2020 • Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirko Visontai, Yonghui Wu, Yu Zhang, Ding Zhao
Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i. e., word error rate (WER), and latency, i. e., the time the hypothesis is finalized after the user stops speaking.
no code implementations • 6 Nov 2019 • Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu
In this paper, we both investigate and improve the performance of end-to-end models on long-form transcription.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 24 Oct 2019 • Arun Narayanan, Rohit Prabhavalkar, Chung-Cheng Chiu, David Rybach, Tara N. Sainath, Trevor Strohman
In this work, we examine the ability of E2E models to generalize to unseen domains, where we find that models trained on short utterances fail to generalize to long-form speech.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 24 Sep 2018 • Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro Moreno, Rohit Prabhavalkar, Zhongdi Qu, Austin Waters
Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
no code implementations • 16 Aug 2018 • Arun Narayanan, Ananya Misra, Khe Chai Sim, Golan Pundak, Anshuman Tripathi, Mohamed Elfeky, Parisa Haghani, Trevor Strohman, Michiel Bacchiani
More importantly, such models generalize better to unseen conditions and allow for rapid adaptation -- we show that by using as little as 10 hours of data from a new domain, an adapted domain-invariant model can match performance of a domain-specific model trained from scratch using 70 times as much data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1