no code implementations • 13 Sep 2024 • Arnav Kundu, Yanzi Jin, Mohammad Sekhavat, Max Horton, Danny Tormoen, Devang Naik
This paper delves into the challenging task of Active Speaker Detection (ASD), where the system needs to determine in real-time whether a person is speaking or not in a series of video frames.
Active Speaker Detection Audio-Visual Active Speaker Detection
no code implementations • 4 Jun 2024 • Arnav Kundu, Prateeth Nayak, Priyanka Padmanabhan, Devang Naik
Always-on machine learning models require a very low memory and compute footprint.
no code implementations • 9 Oct 2023 • Utkarsh Oggy Sarawgi, John Berkowitz, Vineet Garg, Arnav Kundu, Minsik Cho, Sai Srujana Buddi, Saurabh Adya, Ahmed Tewfik
Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms.
no code implementations • 14 Mar 2023 • Arnav Kundu, Chungkuk Yoo, Srijan Mishra, Minsik Cho, Saurabh Adya
To overcome the challenge, we focus on outliers in weights of a pre-trained model which disrupt effective lower bit quantization and compression.
Ranked #1 on Model Compression on QNLI
no code implementations • 26 Oct 2022 • Arnav Kundu, Mohammad Samragh Razlighi, Minsik Cho, Priyanka Padmanabhan, Devang Naik
Streaming keyword spotting is a widely used solution for activating voice assistants.
Ranked #1 on Keyword Spotting on hey Siri
no code implementations • 24 Oct 2022 • Mohammad Samragh, Arnav Kundu, Ting-yao Hu, Minsik Cho, Aman Chadha, Ashish Shrivastava, Oncel Tuzel, Devang Naik
This paper explores the possibility of using visual object detection techniques for word localization in speech data.
no code implementations • 2 Nov 2020 • Ashish Shrivastava, Arnav Kundu, Chandra Dhir, Devang Naik, Oncel Tuzel
The DNN, in prior methods, is trained independent of the HMM parameters to minimize the cross-entropy loss between the predicted and the ground-truth state probabilities.
Ranked #2 on Keyword Spotting on hey Siri