Search Results for author: Anirudh Raju

Found 21 papers, 1 papers with code

On Evaluating and Comparing Open Domain Dialog Systems

no code implementations • 11 Jan 2018 • Anu Venkatesh, Chandra Khatri, Ashwin Ram, Fenfei Guo, Raefer Gabriel, Ashish Nagar, Rohit Prasad, Ming Cheng, Behnam Hedayatnia, Angeliki Metallinou, Rahul Goel, Shaohua Yang, Anirudh Raju

In this paper, we propose a comprehensive evaluation strategy with multiple metrics designed to reduce subjectivity by selecting metrics which correlate well with human judgement.

Goal-Oriented Dialogue Systems Open-Domain Dialog

Paper
Add Code

Topic-based Evaluation for Conversational Bots

1 code implementation • 11 Jan 2018 • Fenfei Guo, Angeliki Metallinou, Chandra Khatri, Anirudh Raju, Anu Venkatesh, Ashwin Ram

Dialog evaluation is a challenging problem, especially for non task-oriented dialogs where conversational success is not well-defined.

Topic Classification

Paper
Code

Max-Pooling Loss Training of Long Short-Term Memory Networks for Small-Footprint Keyword Spotting

no code implementations • 5 May 2017 • Ming Sun, Anirudh Raju, George Tucker, Sankaran Panchapagesan, Geng-Shen Fu, Arindam Mandal, Spyros Matsoukas, Nikko Strom, Shiv Vitaladevuni

Finally, the max-pooling loss trained LSTM initialized with a cross-entropy pre-trained network shows the best performance, which yields $67. 6\%$ relative reduction compared to baseline feed-forward DNN in Area Under the Curve (AUC) measure.

Small-Footprint Keyword Spotting

Paper
Add Code

Contextual Language Model Adaptation for Conversational Agents

no code implementations • 26 Jun 2018 • Anirudh Raju, Behnam Hedayatnia, Linda Liu, Ankur Gandhe, Chandra Khatri, Angeliki Metallinou, Anu Venkatesh, Ariya Rastrow

Statistical language models (LM) play a key role in Automatic Speech Recognition (ASR) systems used by conversational agents.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Data Augmentation for Robust Keyword Spotting under Playback Interference

no code implementations • 1 Aug 2018 • Anirudh Raju, Sankaran Panchapagesan, Xing Liu, Arindam Mandal, Nikko Strom

Accurate on-device keyword spotting (KWS) with low false accept and false reject rate is crucial to customer experience for far-field voice control of conversational agents.

Acoustic echo cancellation Data Augmentation +1

Paper
Add Code

Improving noise robustness of automatic speech recognition via parallel data and teacher-student learning

no code implementations • 5 Jan 2019 • Ladislav Mošner, Minhua Wu, Anirudh Raju, Sree Hari Krishnan Parthasarathi, Kenichi Kumatani, Shiva Sundaram, Roland Maas, Björn Hoffmeister

For real-world speech recognition applications, noise robustness is still a challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Scalable Multi Corpora Neural Language Models for ASR

no code implementations • 2 Jul 2019 • Anirudh Raju, Denis Filimonov, Gautam Tiwari, Guitang Lan, Ariya Rastrow

Neural language models (NLM) have been shown to outperform conventional n-gram language models by a substantial margin in Automatic Speech Recognition (ASR) and other tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

no code implementations • 14 Aug 2020 • Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow

Finally, we contrast these methods to a jointly trained end-to-end joint SLU model, consisting of ASR and NLU subsystems which are connected by a neural network based interface instead of text, that produces transcripts as well as NLU interpretation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Multi-task Language Modeling for Improving Speech Recognition of Rare Words

no code implementations • 23 Nov 2020 • Chao-Han Huck Yang, Linda Liu, Ankur Gandhe, Yile Gu, Anirudh Raju, Denis Filimonov, Ivan Bulyko

We show that our rescoring model trained with these additional tasks outperforms the baseline rescoring model, trained with only the language modeling task, by 1. 4% on a general test and by 2. 6% on a rare word test set in terms of word-error-rate relative (WERR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

no code implementations • 12 Feb 2021 • Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke

Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Listen with Intent: Improving Speech Recognition with Audio-to-Intent Front-End

no code implementations • 14 May 2021 • Swayambhu Nath Ray, Minhua Wu, Anirudh Raju, Pegah Ghahremani, Raghavendra Bilgi, Milind Rao, Harish Arsikere, Ariya Rastrow, Andreas Stolcke, Jasha Droppo

On the other hand, a streaming system using per-frame intent posteriors as extra inputs for the RNN-T ASR system yields a 3. 33% relative WERR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

On joint training with interfaces for spoken language understanding

no code implementations • 30 Jun 2021 • Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow

Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Attentive Contextual Carryover for Multi-Turn End-to-End Spoken Language Understanding

no code implementations • 13 Dec 2021 • Kai Wei, Thanh Tran, Feng-Ju Chang, Kanthashree Mysore Sathyendra, Thejaswi Muniyappa, Jing Liu, Anirudh Raju, Ross McGowan, Nathan Susanj, Ariya Rastrow, Grant P. Strimel

Recent years have seen significant advances in end-to-end (E2E) spoken language understanding (SLU) systems, which directly predict intents and slots from spoken audio.

Natural Language Understanding Spoken Language Understanding

Paper
Add Code

MTL-SLT: Multi-Task Learning for Spoken Language Tasks

no code implementations • NLP4ConvAI (ACL) 2022 • Zhiqi Huang, Milind Rao, Anirudh Raju, Zhe Zhang, Bach Bui, Chul Lee

The proposed framework benefits from three key aspects: 1) pre-trained sub-networks of ASR model and language model; 2) multi-task learning objective to exploit shared knowledge from different tasks; 3) end-to-end training of ASR and downstream NLP task based on sequence loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

ILASR: Privacy-Preserving Incremental Learning for Automatic Speech Recognition at Production Scale

no code implementations • 19 Jul 2022 • Gopinath Chennupati, Milind Rao, Gurpreet Chadha, Aaron Eakin, Anirudh Raju, Gautam Tiwari, Anit Kumar Sahu, Ariya Rastrow, Jasha Droppo, Andy Oberlin, Buddha Nandanoor, Prahalad Venkataramanan, Zheng Wu, Pankaj Sitpure

For end-to-end automatic speech recognition (ASR) tasks, the absence of human annotated labels along with the need for privacy preserving policies for model building makes it a daunting challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities

no code implementations • 22 Jul 2022 • Pranav Dheram, Murugesan Ramakrishnan, Anirudh Raju, I-Fan Chen, Brian King, Katherine Powell, Melissa Saboowala, Karan Shetty, Andreas Stolcke

As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts.

Fairness speech-recognition +1

Paper
Add Code

Adaptive Endpointing with Deep Contextual Multi-armed Bandits

no code implementations • 23 Mar 2023 • Do June Min, Andreas Stolcke, Anirudh Raju, Colin Vaz, Di He, Venkatesh Ravichandran, Viet Anh Trinh

In this paper, we aim to provide a solution for adaptive endpointing by proposing an efficient method for choosing an optimal endpointing configuration given utterance-level audio features in an online setting, while avoiding hyperparameter grid-search.

Multi-Armed Bandits

Paper
Add Code

Cross-utterance ASR Rescoring with Graph-based Label Propagation

no code implementations • 27 Mar 2023 • Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran

We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity.

Fairness Language Modelling

Paper
Add Code

Federated Self-Learning with Weak Supervision for Speech Recognition

no code implementations • 21 Jun 2023 • Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo

Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Two-pass Endpoint Detection for Speech Recognition

no code implementations • 17 Jan 2024 • Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow

Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands.

speech-recognition Speech Recognition

Paper
Add Code

Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

no code implementations • 26 Jan 2024 • Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).

Language Modelling Large Language Model

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.