1 code implementation • 20 Dec 2023 • Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
Specifically, first, we perform vanilla continued pre-training on an initial SSL pre-trained model on the target domain ASR dataset and call it the teacher.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 20 Dec 2023 • Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
Continued pre-training (CP) offers multiple advantages, like target domain adaptation and the potential to exploit the continuous stream of unlabeled data available online.
no code implementations • 31 May 2023 • Kaousheik Jayakumar, Vrunda N. Sukhadia, A Arunkumar, S. Umesh
Building a multilingual Automated Speech Recognition (ASR) system in a linguistically diverse country like India can be a challenging task due to the differences in scripts and the limited availability of speech data.
1 code implementation • 10 Mar 2023 • Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
Unlike prior works, which directly fine-tune a self-supervised pre-trained encoder on a target dataset, we use the encoder to generate pseudo-labels for unsupervised fine-tuning before the actual fine-tuning step.
no code implementations • 3 Nov 2022 • Vrunda N. Sukhadia, A. Arunkumar, S. Umesh
Our method gives a relative improvement of ~4% over the joint encoder-decoder self-supervised model built with simple pooling of data, which serves as our baseline.
1 code implementation • 2 Nov 2022 • Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha
We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio data that reduces the need for large amounts of labeled data for audio and speech classification.
1 code implementation • 2 Nov 2022 • Sreyan Ghosh, Ashish Seth, S. Umesh, Dinesh Manocha
We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST).
1 code implementation • 2 Nov 2022 • Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh
In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for speech representation learning from unlabeled speech data.
Automatic Speech Recognition (ASR) Representation Learning +1
1 code implementation • 5 Oct 2022 • Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh
While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered.
no code implementations • 11 Jun 2022 • A Arunkumar, Vrunda N Sukhadia, S. Umesh
To this end, we use three SSL models that have shown excellent results on ASR tasks, namely HuBERT, Wav2vec2. 0, and WaveLM.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 31 Mar 2022 • Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh
To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data.
no code implementations • 31 Mar 2022 • Ashish Seth, Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh
Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 30 Mar 2022 • Sreyan Ghosh, Sonal Kumar, Yaman Kumar Singla, Rajiv Ratn Shah, S. Umesh
Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text.
1 code implementation • 25 Mar 2022 • Sreyan Ghosh, Ashish Seth, and Deepak Mittal, Maneesh Singh, S. Umesh
Inspired by the recent progress in self-supervised learning for computer vision, in this paper we introduce DeLoRes, a new general-purpose audio representation learning approach.
no code implementations • 18 Feb 2022 • Vrunda N. Sukhadia, S. Umesh
We, therefore, propose to use the embeddings tapped from these encoder layers as features for a downstream Conformer target-domain model and show that they provide significant improvements.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 17 Oct 2021 • Sreyan Ghosh, Sandesh V Katta, Ashish Seth, S. Umesh
We introduce DECAR, a self-supervised pre-training approach for learning general-purpose audio representations.
1 code implementation • 14 Oct 2021 • Sreyan Ghosh, Samden Lepcha, S Sakshi, Rajiv Ratn Shah, S. Umesh
We believe that our dataset would act as a benchmark for the relatively new and un-explored Spoken Language Processing task of detecting toxicity from spoken utterances and boost further research in this space.
no code implementations • 7 Aug 2020 • Vishwas M. Shetty, Metilda Sagaya Mary N J, S. Umesh
We present speaker information in the form of speaker embeddings for each of the speakers.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 15 Jul 2013 • D. S. Pavan Kumar, N. Vishnu Prasad, Vikas Joshi, S. Umesh
In this paper, a modification to the training process of the popular SPLICE algorithm has been proposed for noise robust speech recognition.