Search Results for author: Arun Narayanan

Found 25 papers, 1 papers with code

Toward domain-invariant speech recognition via large scale training

no code implementations • 16 Aug 2018 • Arun Narayanan, Ananya Misra, Khe Chai Sim, Golan Pundak, Anshuman Tripathi, Mohamed Elfeky, Parisa Haghani, Trevor Strohman, Michiel Bacchiani

More importantly, such models generalize better to unseen conditions and allow for rapid adaptation -- we show that by using as little as 10 hours of data from a new domain, an adapted domain-invariant model can match performance of a domain-specific model trained from scratch using 70 times as much data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

From Audio to Semantics: Approaches to end-to-end spoken language understanding

no code implementations • 24 Sep 2018 • Parisa Haghani, Arun Narayanan, Michiel Bacchiani, Galen Chuang, Neeraj Gaur, Pedro Moreno, Rohit Prabhavalkar, Zhongdi Qu, Austin Waters

Conventional spoken language understanding systems consist of two main components: an automatic speech recognition module that converts audio to a transcript, and a natural language understanding module that transforms the resulting text (or top N hypotheses) into a set of domains, intents, and arguments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Recognizing long-form speech using streaming end-to-end models

no code implementations • 24 Oct 2019 • Arun Narayanan, Rohit Prabhavalkar, Chung-Cheng Chiu, David Rybach, Tara N. Sainath, Trevor Strohman

In this work, we examine the ability of E2E models to generalize to unseen domains, where we find that models trained on short utterances fail to generalize to long-form speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A comparison of end-to-end models for long-form speech recognition

no code implementations • 6 Nov 2019 • Chung-Cheng Chiu, Wei Han, Yu Zhang, Ruoming Pang, Sergey Kishchenko, Patrick Nguyen, Arun Narayanan, Hank Liao, Shuyuan Zhang, Anjuli Kannan, Rohit Prabhavalkar, Zhifeng Chen, Tara Sainath, Yonghui Wu

In this paper, we both investigate and improve the performance of end-to-end models on long-form transcription.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Streaming On-Device End-to-End Model Surpassing Server-Side Conventional Model Quality and Latency

no code implementations • 28 Mar 2020 • Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirko Visontai, Yonghui Wu, Yu Zhang, Ding Zhao

Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i. e., word error rate (WER), and latency, i. e., the time the hypothesis is finalized after the user stops speaking.

Sentence

Paper
Add Code

RNN-T Models Fail to Generalize to Out-of-Domain Audio: Causes and Solutions

no code implementations • 7 May 2020 • Chung-Cheng Chiu, Arun Narayanan, Wei Han, Rohit Prabhavalkar, Yu Zhang, Navdeep Jaitly, Ruoming Pang, Tara N. Sainath, Patrick Nguyen, Liangliang Cao, Yonghui Wu

On a long-form YouTube test set, when the nonstreaming RNN-T model is trained with shorter segments of data, the proposed combination improves word error rate (WER) from 22. 3% to 14. 8%; when the streaming RNN-T model trained on short Search queries, the proposed techniques improve WER on the YouTube set from 67. 0% to 25. 3%.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

FastEmit: Low-latency Streaming ASR with Sequence-level Emission Regularization

1 code implementation • 21 Oct 2020 • Jiahui Yu, Chung-Cheng Chiu, Bo Li, Shuo-Yiin Chang, Tara N. Sainath, Yanzhang He, Arun Narayanan, Wei Han, Anmol Gulati, Yonghui Wu, Ruoming Pang

FastEmit also improves streaming ASR accuracy from 4. 4%/8. 9% to 3. 1%/7. 5% WER, meanwhile reduces 90th percentile latency from 210 ms to only 30 ms on LibriSpeech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Improving Streaming Automatic Speech Recognition With Non-Streaming Model Distillation On Unsupervised Data

no code implementations • 22 Oct 2020 • Thibault Doutre, Wei Han, Min Ma, Zhiyun Lu, Chung-Cheng Chiu, Ruoming Pang, Arun Narayanan, Ananya Misra, Yu Zhang, Liangliang Cao

We propose a novel and effective learning method by leveraging a non-streaming ASR model as a teacher to generate transcripts on an arbitrarily large data set, which is then used to distill knowledge into streaming ASR models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cascaded encoders for unifying streaming and non-streaming ASR

no code implementations • 27 Oct 2020 • Arun Narayanan, Tara N. Sainath, Ruoming Pang, Jiahui Yu, Chung-Cheng Chiu, Rohit Prabhavalkar, Ehsan Variani, Trevor Strohman

The proposed model consists of streaming and non-streaming encoders.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Better and Faster End-to-End Model for Streaming ASR

no code implementations • 21 Nov 2020 • Bo Li, Anmol Gulati, Jiahui Yu, Tara N. Sainath, Chung-Cheng Chiu, Arun Narayanan, Shuo-Yiin Chang, Ruoming Pang, Yanzhang He, James Qin, Wei Han, Qiao Liang, Yu Zhang, Trevor Strohman, Yonghui Wu

To address this, we explore replacing the LSTM layers in the encoder of our E2E model with Conformer layers [4], which has shown good improvements for ASR.

Audio and Speech Processing Sound

Paper
Add Code

Less Is More: Improved RNN-T Decoding Using Limited Label Context and Path Merging

no code implementations • 12 Dec 2020 • Rohit Prabhavalkar, Yanzhang He, David Rybach, Sean Campbell, Arun Narayanan, Trevor Strohman, Tara N. Sainath

End-to-end models that condition the output label sequence on all previously predicted labels have emerged as popular alternatives to conventional systems for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Virtual Microgrid Management via Software-defined Energy Network for Electricity Sharing

no code implementations • 1 Feb 2021 • Pedro H. J. Nardelli, Hafiz Majid Hussein, Arun Narayanan, Yongheng Yang

Digitalization has led to radical changes in the distribution of goods across various sectors.

energy management Management

Paper
Add Code

Personalized Keyphrase Detection using Speaker and Environment Information

no code implementations • 28 Apr 2021 • Rajeev Rikhye, Quan Wang, Qiao Liang, Yanzhang He, Ding Zhao, Yiteng, Huang, Arun Narayanan, Ian McGraw

In this paper, we introduce a streaming keyphrase detection system that can be easily customized to accurately detect any phrase composed of words from a large vocabulary.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Packetized Energy Management Controller for Residential Consumers

no code implementations • 19 Aug 2021 • Hafiz Majid Hussain, Ashfaq Ahmad, Arun Narayanan, Pedro H. J. Nardelli, Yongheng Yang

In this paper, we investigate the management of energy storage control and load scheduling in scenarios considering a grid-connected photovoltaic (PV) system using packetized energy management.

energy management Management +1

Paper
Add Code

Cross-attention conformer for context modeling in speech enhancement for ASR

no code implementations • 30 Oct 2021 • Arun Narayanan, Chung-Cheng Chiu, Tom O'Malley, Quan Wang, Yanzhang He

This work introduces \emph{cross-attention conformer}, an attention-based architecture for context modeling in speech enhancement.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

SNRi Target Training for Joint Speech Enhancement and Recognition

no code implementations • 1 Nov 2021 • Yuma Koizumi, Shigeki Karita, Arun Narayanan, Sankaran Panchapagesan, Michiel Bacchiani

Furthermore, by analyzing the predicted target SNRi, we observed the jointly trained network automatically controls the target SNRi according to noise characteristics.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Conformer-based ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement and Speech Separation

no code implementations • 18 Nov 2021 • Tom O'Malley, Arun Narayanan, Quan Wang, Alex Park, James Walker, Nathan Howard

Compared to the noisy baseline, the joint model reduces the word error rate in low signal-to-noise ratio conditions by at least 71% on our echo cancellation dataset, 10% on our noisy dataset, and 26% on our multi-speaker dataset.

Acoustic echo cancellation Automatic Speech Recognition +4

Paper
Add Code

Personal VAD 2.0: Optimizing Personal Voice Activity Detection for On-Device Speech Recognition

no code implementations • 8 Apr 2022 • Shaojin Ding, Rajeev Rikhye, Qiao Liang, Yanzhang He, Quan Wang, Arun Narayanan, Tom O'Malley, Ian McGraw

Personalization of on-device speech recognition (ASR) has seen explosive growth in recent years, largely due to the increasing popularity of personal assistant features on mobile devices and smart home speakers.

Action Detection Activity Detection +2

Paper
Add Code

Extracting Targeted Training Data from ASR Models, and How to Mitigate It

no code implementations • 18 Apr 2022 • Ehsan Amid, Om Thakkar, Arun Narayanan, Rajiv Mathews, Françoise Beaufays

We design Noise Masking, a fill-in-the-blank style method for extracting targeted parts of training data from trained ASR models.

Data Augmentation

Paper
Add Code

Cleanformer: A multichannel array configuration-invariant neural enhancement frontend for ASR in smart speakers

no code implementations • 25 Apr 2022 • Joseph Caroselli, Arun Narayanan, Nathan Howard, Tom O'Malley

This work introduces the Cleanformer, a streaming multichannel neural based enhancement frontend for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Mask scalar prediction for improving robust automatic speech recognition

no code implementations • 26 Apr 2022 • Arun Narayanan, James Walker, Sankaran Panchapagesan, Nathan Howard, Yuma Koizumi

Using neural network based acoustic frontends for improving robustness of streaming automatic speech recognition (ASR) systems is challenging because of the causality constraints and the resulting distortion that the frontend processing introduces in speech.

Acoustic echo cancellation Automatic Speech Recognition +2

Paper
Add Code

A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy

no code implementations • 6 May 2022 • Sankaran Panchapagesan, Arun Narayanan, Turaj Zakizadeh Shabestary, Shuai Shao, Nathan Howard, Alex Park, James Walker, Alexander Gruenstein

Acoustic Echo Cancellation (AEC) is essential for accurate recognition of queries spoken to a smart speaker that is playing out audio.

Acoustic echo cancellation Automatic Speech Recognition +2

Paper
Add Code

Streaming Noise Context Aware Enhancement For Automatic Speech Recognition in Multi-Talker Environments

no code implementations • 17 May 2022 • Joe Caroselli, Arun Narayanan, Yiteng Huang

First is the Context Aware Beamformer which uses the noise context and detected hotword to determine how to target the desired speaker.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

A Universally-Deployable ASR Frontend for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation

no code implementations • 14 Sep 2022 • Tom O'Malley, Arun Narayanan, Quan Wang

The joint model uses contextual information, such as a reference of the playback audio, noise context, and speaker embedding.

Acoustic echo cancellation Automatic Speech Recognition +3

Paper
Add Code

Extreme Encoder Output Frame Rate Reduction: Improving Computational Latencies of Large End-to-End Models

no code implementations • 27 Feb 2024 • Rohit Prabhavalkar, Zhong Meng, Weiran Wang, Adam Stooke, Xingyu Cai, Yanzhang He, Arun Narayanan, Dongseong Hwang, Tara N. Sainath, Pedro J. Moreno

In the present work, we study one such strategy: applying multiple frame reduction layers in the encoder to compress encoder outputs into a small number of output frames.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.