Search Results for author: Pranay Dighe

Found 12 papers, 0 papers with code

Modality Dropout for Multimodal Device Directed Speech Detection using Verbal and Non-Verbal Features

no code implementations • 23 Oct 2023 • Gautam Krishna, Sameer Dharur, Oggi Rudovic, Pranay Dighe, Saurabh Adya, Ahmed Hussen Abdelaziz, Ahmed H Tewfik

Device-directed speech detection (DDSD) is the binary classification task of distinguishing between queries directed at a voice assistant versus side conversation or background speech.

Automatic Speech Recognition Binary Classification +2

Paper
Add Code

Leveraging Large Language Models for Exploiting ASR Uncertainty

no code implementations • 9 Sep 2023 • Pranay Dighe, Yi Su, Shangshang Zheng, Yunshu Liu, Vineet Garg, Xiaochuan Niu, Ahmed Tewfik

While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Add Code

Audio-to-Intent Using Acoustic-Textual Subword Representations from End-to-End ASR

no code implementations • 21 Oct 2022 • Pranay Dighe, Prateeth Nayak, Oggi Rudovic, Erik Marchi, Xiaochuan Niu, Ahmed Tewfik

Accurate prediction of the user intent to interact with a voice assistant (VA) on a device (e. g. on the phone) is critical for achieving naturalistic, engaging, and privacy-centric interactions with the VA. To this end, we present a novel approach to predict the user's intent (the user speaking to the device or not) directly from acoustic and textual information encoded at subword tokens which are obtained via an end-to-end ASR model.

intent-classification Intent Classification

Paper
Add Code

Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models

no code implementations • 30 Mar 2022 • Vineet Garg, Ognjen Rudovic, Pranay Dighe, Ahmed H. Abdelaziz, Erik Marchi, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

We also show that the ensemble of the LatticeRNN and acoustic-distilled models brings further accuracy improvement of 20%.

Knowledge Distillation

Paper
Add Code

Streaming on-device detection of device directed speech from voice and touch-based invocation

no code implementations • 9 Oct 2021 • Ognjen Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, Sachin Kajarekar

When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device.

Computational Efficiency

Paper
Add Code

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation

no code implementations • 14 May 2021 • Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir

We propose a streaming transformer (TF) encoder architecture, which progressively processes incoming audio chunks and maintains audio context to perform both VTD and FTM tasks using only acoustic features.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Knowledge Transfer for Efficient On-device False Trigger Mitigation

no code implementations • 20 Oct 2020 • Pranay Dighe, Erik Marchi, Srikanth Vishnubhotla, Sachin Kajarekar, Devang Naik

But in case of a false trigger, transcribing the audio using ASR itself is strongly undesirable.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Complementary Language Model and Parallel Bi-LRNN for False Trigger Mitigation

no code implementations • 18 Aug 2020 • Rishika Agarwal, Xiaochuan Niu, Pranay Dighe, Srikanth Vishnubhotla, Sameer Badaskar, Devang Naik

In this paper, we propose a novel solution to the FTM problem by introducing a parallel ASR decoding process with a special language model trained from "out-of-domain" data sources.

Language Modelling

Paper
Add Code

Lattice-based Improvements for Voice Triggering Using Graph Neural Networks

no code implementations • 25 Jan 2020 • Pranay Dighe, Saurabh Adya, Nuoyu Li, Srikanth Vishnubhotla, Devang Naik, Adithya Sagar, Ying Ma, Stephen Pulman, Jason Williams

A pure trigger-phrase detector model doesn't fully utilize the intent of the user speech whereas by using the complete decoding lattice of user audio, we can effectively mitigate speech not intended for the smart assistant.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Information Theoretic Analysis of DNN-HMM Acoustic Modeling

no code implementations • 29 Aug 2017 • Pranay Dighe, Afsaneh Asaei, Hervé Bourlard

We propose an information theoretic framework for quantitative assessment of acoustic modeling for hidden Markov model (HMM) based automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Low-rank and Sparse Soft Targets to Learn Better DNN Acoustic Models

no code implementations • 18 Oct 2016 • Pranay Dighe, Afsaneh Asaei, Herve Bourlard

Conventional deep neural networks (DNN) for speech acoustic modeling rely on Gaussian mixture models (GMM) and hidden Markov model (HMM) to obtain binary class labels as the targets for DNN training.

speech-recognition Speech Recognition

Paper
Add Code

Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition

no code implementations • 22 Jan 2016 • Pranay Dighe, Gil Luyet, Afsaneh Asaei, Herve Bourlard

We propose to model the acoustic space of deep neural network (DNN) class-conditional posterior probabilities as a union of low-dimensional subspaces.

Dictionary Learning speech-recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.