no code implementations • 12 May 2023 • Peter Plantinga, Jaekwon Yoo, Chandra Dhir
Our experiments show that a simple linear interpolation of several models' parameters, each fine-tuned from the same generalist model, results in a single model that performs well on all tested data.
no code implementations • 5 Apr 2022 • Prateeth Nayak, Takuya Higuchi, Anmol Gupta, Shivesh Ranjan, Stephen Shum, Siddharth Sigtia, Erik Marchi, Varun Lakshminarasimhan, Minsik Cho, Saurabh Adya, Chandra Dhir, Ahmed Tewfik
A detector is typically trained on speech data independent of speaker information and used for the voice trigger detection task.
no code implementations • 30 Mar 2022 • Vineet Garg, Ognjen Rudovic, Pranay Dighe, Ahmed H. Abdelaziz, Erik Marchi, Saurabh Adya, Chandra Dhir, Ahmed Tewfik
We also show that the ensemble of the LatticeRNN and acoustic-distilled models brings further accuracy improvement of 20%.
no code implementations • 15 Jul 2021 • Takuya Higuchi, Anmol Gupta, Chandra Dhir
In this approach, an output of an acoustic model is split into two branches for the two tasks, one for phoneme transcription trained with the ASR data and one for keyword classification trained with the KWS data.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 14 May 2021 • Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir
We propose a streaming transformer (TF) encoder architecture, which progressively processes incoming audio chunks and maintains audio context to perform both VTD and FTM tasks using only acoustic features.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 18 Feb 2021 • Takuya Higuchi, Shreyas Saxena, Mehrez Souden, Tien Dung Tran, Masood Delfarah, Chandra Dhir
We propose dynamic curriculum learning via data parameters for noise robust keyword spotting.
no code implementations • 2 Nov 2020 • Ashish Shrivastava, Arnav Kundu, Chandra Dhir, Devang Naik, Oncel Tuzel
The DNN, in prior methods, is trained independent of the HMM parameters to minimize the cross-entropy loss between the predicted and the ground-truth state probabilities.
Ranked #2 on
Keyword Spotting
on hey Siri
no code implementations • 5 Aug 2020 • Saurabh Adya, Vineet Garg, Siddharth Sigtia, Pramod Simha, Chandra Dhir
Our baseline is an acoustic model(AM), with BiLSTM layers, trained by minimizing the CTC loss.
no code implementations • 9 Mar 2020 • Ting-yao Hu, Ashish Shrivastava, Oncel Tuzel, Chandra Dhir
We present a method to generate speech from input text and a style vector that is extracted from a reference speech signal in an unsupervised manner, i. e., no style annotation, such as speaker information, is required.