Search Results for author: Ariya Rastrow

Found 47 papers, 1 papers with code

Personalization for BERT-based Discriminative Speech Recognition Rescoring

no code implementations13 Jul 2023 Jari Kolehmainen, Yile Gu, Aditya Gourav, Prashanth Gurunath Shivakumar, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

On a test set with personalized named entities, we show that each of these approaches improves word error rate by over 10%, against a neural rescoring baseline.

speech-recognition Speech Recognition

Scaling Laws for Discriminative Speech Recognition Rescoring Models

no code implementations27 Jun 2023 Yile Gu, Prashanth Gurunath Shivakumar, Jari Kolehmainen, Ankur Gandhe, Ariya Rastrow, Ivan Bulyko

We study whether this scaling property is also applicable to second-pass rescoring, which is an important component of speech recognition systems.

speech-recognition Speech Recognition

Federated Self-Learning with Weak Supervision for Speech Recognition

no code implementations21 Jun 2023 Milind Rao, Gopinath Chennupati, Gautam Tiwari, Anit Kumar Sahu, Anirudh Raju, Ariya Rastrow, Jasha Droppo

Automatic speech recognition (ASR) models with low-footprint are increasingly being deployed on edge devices for conversational agents, which enhances privacy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Streaming Speech-to-Confusion Network Speech Recognition

no code implementations2 Jun 2023 Denis Filimonov, Prabhat Pandey, Ariya Rastrow, Ankur Gandhe, Andreas Stolcke

In interactive automatic speech recognition (ASR) systems, low-latency requirements limit the amount of search space that can be explored during decoding, particularly in end-to-end neural ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Personalized Predictive ASR for Latency Reduction in Voice Assistants

no code implementations23 May 2023 Andreas Schwarz, Di He, Maarten Van Segbroeck, Mohammed Hethnawi, Ariya Rastrow

Streaming Automatic Speech Recognition (ASR) in voice assistants can utilize prefetching to partially hide the latency of response generation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

PROCTER: PROnunciation-aware ConTextual adaptER for personalized speech recognition in neural transducers

no code implementations30 Mar 2023 Rahul Pandey, Roger Ren, Qi Luo, Jing Liu, Ariya Rastrow, Ankur Gandhe, Denis Filimonov, Grant Strimel, Andreas Stolcke, Ivan Bulyko

End-to-End (E2E) automatic speech recognition (ASR) systems used in voice assistants often have difficulties recognizing infrequent words personalized to the user, such as names and places.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

no code implementations6 Jan 2023 David M. Chan, Shalini Ghosh, Ariya Rastrow, Björn Hoffmeister

Despite improvements to the generalization performance of automated speech recognition (ASR) models, specializing ASR models for downstream tasks remains a challenging task, primarily due to reduced data availability (necessitating increased data collection), and rapidly shifting data distributions (requiring more frequent model fine-tuning).

Domain Adaptation speech-recognition +1

Compute Cost Amortized Transformer for Streaming ASR

no code implementations5 Jul 2022 Yi Xie, Jonathan Macoskey, Martin Radfar, Feng-Ju Chang, Brian King, Ariya Rastrow, Athanasios Mouchtaris, Grant P. Strimel

We present a streaming, Transformer-based end-to-end automatic speech recognition (ASR) architecture which achieves efficient neural inference through compute cost amortization.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

FANS: Fusing ASR and NLU for on-device SLU

no code implementations31 Oct 2021 Martin Radfar, Athanasios Mouchtaris, Siegfried Kunzmann, Ariya Rastrow

In this paper, we introduce FANS, a new end-to-end SLU model that fuses an ASR audio encoder to a multi-task NLU decoder to infer the intent, slot tags, and slot values directly from a given input audio, obviating the need for transcription.

Ranked #13 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Spoken Language Understanding

Bifocal Neural ASR: Exploiting Keyword Spotting for Inference Optimization

no code implementations3 Aug 2021 Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow

We present Bifocal RNN-T, a new variant of the Recurrent Neural Network Transducer (RNN-T) architecture designed for improved inference time latency on speech recognition tasks.

Inference Optimization Keyword Spotting +3

Amortized Neural Networks for Low-Latency Speech Recognition

no code implementations3 Aug 2021 Jonathan Macoskey, Grant P. Strimel, Jinru Su, Ariya Rastrow

We apply AmNets to the Recurrent Neural Network Transducer (RNN-T) to reduce compute cost and latency for an automatic speech recognition (ASR) task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Learning a Neural Diff for Speech Models

no code implementations3 Aug 2021 Jonathan Macoskey, Grant P. Strimel, Ariya Rastrow

As more speech processing applications execute locally on edge devices, a set of resource constraints must be considered.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

On joint training with interfaces for spoken language understanding

no code implementations30 Jun 2021 Anirudh Raju, Milind Rao, Gautam Tiwari, Pranav Dheram, Bryan Anderson, Zhe Zhang, Chul Lee, Bach Bui, Ariya Rastrow

Spoken language understanding (SLU) systems extract both text transcripts and semantics associated with intents and slots from input speech utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Do You Listen with One or Two Microphones? A Unified ASR Model for Single and Multi-Channel Audio

no code implementations4 Jun 2021 Gokce Keskin, Minhua Wu, Brian King, Harish Mallidi, Yang Gao, Jasha Droppo, Ariya Rastrow, Roland Maas

An ASR model that operates on both primary and auxiliary data can achieve better accuracy compared to a primary-only solution; and a model that can serve both primary-only (PO) and primary-plus-auxiliary (PPA) modes is highly desirable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Attention-based Contextual Language Model Adaptation for Speech Recognition

1 code implementation Findings (ACL) 2021 Richard Diehl Martinez, Scott Novotney, Ivan Bulyko, Ariya Rastrow, Andreas Stolcke, Ankur Gandhe

When applied to a large de-identified dataset of utterances collected by a popular voice assistant platform, our method reduces perplexity by 7. 0% relative over a standard LM that does not incorporate contextual information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations9 Mar 2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Personalization Strategies for End-to-End Speech Recognition Systems

no code implementations15 Feb 2021 Aditya Gourav, Linda Liu, Ankur Gandhe, Yile Gu, Guitang Lan, Xiangyang Huang, Shashank Kalmane, Gautam Tiwari, Denis Filimonov, Ariya Rastrow, Andreas Stolcke, Ivan Bulyko

We also describe a novel second-pass de-biasing approach: used in conjunction with a first-pass shallow fusion that optimizes on oracle WER, we can achieve an additional 14% improvement on personalized content recognition, and even improve accuracy for the general use case by up to 2. 5%.

speech-recognition Speech Recognition

Do as I mean, not as I say: Sequence Loss Training for Spoken Language Understanding

no code implementations12 Feb 2021 Milind Rao, Pranav Dheram, Gautam Tiwari, Anirudh Raju, Jasha Droppo, Ariya Rastrow, Andreas Stolcke

Spoken language understanding (SLU) systems extract transcriptions, as well as semantics of intent or named entities from speech, and are essential components of voice activated systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Domain-aware Neural Language Models for Speech Recognition

no code implementations5 Jan 2021 Linda Liu, Yile Gu, Aditya Gourav, Ankur Gandhe, Shashank Kalmane, Denis Filimonov, Ariya Rastrow, Ivan Bulyko

As voice assistants become more ubiquitous, they are increasingly expected to support and perform well on a wide variety of use-cases across different domains.

Domain Adaptation Language Modelling +2

Improving accuracy of rare words for RNN-Transducer through unigram shallow fusion

no code implementations30 Nov 2020 Vijay Ravi, Yile Gu, Ankur Gandhe, Ariya Rastrow, Linda Liu, Denis Filimonov, Scott Novotney, Ivan Bulyko

We show that this simple method can improve performance on rare words by 3. 7% WER relative without degradation on general test set, and the improvement from USF is additive to any additional language model based rescoring.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces

no code implementations14 Aug 2020 Milind Rao, Anirudh Raju, Pranav Dheram, Bach Bui, Ariya Rastrow

Finally, we contrast these methods to a jointly trained end-to-end joint SLU model, consisting of ASR and NLU subsystems which are connected by a neural network based interface instead of text, that produces transcripts as well as NLU interpretation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Neural Composition: Learning to Generate from Multiple Models

no code implementations10 Jul 2020 Denis Filimonov, Ravi Teja Gadde, Ariya Rastrow

Decomposing models into multiple components is critically important in many applications such as language modeling (LM) as it enables adapting individual components separately and biasing of some components to the user's personal preferences.

Language Modelling

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

no code implementations25 Jun 2020 Alex Sokolov, Tracy Rohlin, Ariya Rastrow

Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech Recognition (ASR) systems, such as the ASR system in Alexa, as they are used to generate pronunciations for out-of-vocabulary words that do not exist in the pronunciation lexicons (mappings like "e c h o" to "E k oU").

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Streaming Language Identification using Combination of Acoustic Representations and ASR Hypotheses

no code implementations1 Jun 2020 Chander Chandak, Zeynab Raeesy, Ariya Rastrow, Yuzong Liu, Xiangyang Huang, Siyu Wang, Dong Kwon Joo, Roland Maas

A common approach to solve multilingual speech recognition is to run multiple monolingual ASR systems in parallel and rely on a language identification (LID) component that detects the input language.

Language Identification speech-recognition +1

Audio-attention discriminative language model for ASR rescoring

no code implementations6 Dec 2019 Ankur Gandhe, Ariya Rastrow

In this work, we propose to combine the benefits of end-to-end approaches with a conventional system using an attention-based discriminative language model that learns to rescore the output of a first-pass ASR system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Scalable Multi Corpora Neural Language Models for ASR

no code implementations2 Jul 2019 Anirudh Raju, Denis Filimonov, Gautam Tiwari, Guitang Lan, Ariya Rastrow

Neural language models (NLM) have been shown to outperform conventional n-gram language models by a substantial margin in Automatic Speech Recognition (ASR) and other tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Scalable language model adaptation for spoken dialogue systems

no code implementations11 Dec 2018 Ankur Gandhe, Ariya Rastrow, Bjorn Hoffmeister

New application intents and interaction types are released for these systems over time, imposing challenges to adapt the LMs since the existing training data is no longer sufficient to model the future user interactions.

Language Modelling speech-recognition +2

LSTM-based Whisper Detection

no code implementations20 Sep 2018 Zeynab Raeesy, Kellen Gillespie, Zhenpei Yang, Chengyuan Ma, Thomas Drugman, Jiacheng Gu, Roland Maas, Ariya Rastrow, Björn Hoffmeister

We prove that, with enough data, the LSTM model is indeed as capable of learning whisper characteristics from LFBE features alone compared to a simpler MLP model that uses both LFBE and features engineered for separating whisper and normal speech.


Device-directed Utterance Detection

no code implementations7 Aug 2018 Sri Harish Mallidi, Roland Maas, Kyle Goehner, Ariya Rastrow, Spyros Matsoukas, Björn Hoffmeister

In this work, we propose a classifier for distinguishing device-directed queries from background speech in the context of interactions with voice assistants.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Just ASK: Building an Architecture for Extensible Self-Service Spoken Language Understanding

no code implementations1 Nov 2017 Anjishnu Kumar, Arpit Gupta, Julian Chan, Sam Tucker, Bjorn Hoffmeister, Markus Dreyer, Stanislav Peshterliev, Ankur Gandhe, Denis Filiminov, Ariya Rastrow, Christian Monson, Agnika Kumar

This paper presents the design of the machine learning architecture that underlies the Alexa Skills Kit (ASK) a large scale Spoken Language Understanding (SLU) Software Development Kit (SDK) that enables developers to extend the capabilities of Amazon's virtual assistant, Alexa.

Spoken Language Understanding

Cannot find the paper you are looking for? You can Submit a new open access paper.