Search Results for author: S. Umesh

Found 20 papers, 12 papers with code

Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition

1 code implementation20 Dec 2023 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Specifically, first, we perform vanilla continued pre-training on an initial SSL pre-trained model on the target domain ASR dataset and call it the teacher.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

1 code implementation20 Dec 2023 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Continued pre-training (CP) offers multiple advantages, like target domain adaptation and the potential to exploit the continuous stream of unlabeled data available online.

Domain Adaptation Self-Supervised Learning

The Tag-Team Approach: Leveraging CLS and Language Tagging for Enhancing Multilingual ASR

no code implementations31 May 2023 Kaousheik Jayakumar, Vrunda N. Sukhadia, A Arunkumar, S. Umesh

Building a multilingual Automated Speech Recognition (ASR) system in a linguistically diverse country like India can be a challenging task due to the differences in scripts and the limited availability of speech data.

speech-recognition Speech Recognition +1

UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation

1 code implementation10 Mar 2023 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Unlike prior works, which directly fine-tune a self-supervised pre-trained encoder on a target dataset, we use the encoder to generate pseudo-labels for unsupervised fine-tuning before the actual fine-tuning step.

Audio Classification Self-Supervised Learning

Channel-Aware Pretraining of Joint Encoder-Decoder Self-Supervised Model for Telephonic-Speech ASR

no code implementations3 Nov 2022 Vrunda N. Sukhadia, A. Arunkumar, S. Umesh

Our method gives a relative improvement of ~4% over the joint encoder-decoder self-supervised model built with simple pooling of data, which serves as our baseline.

Clustering

SLICER: Learning universal audio representations using low-resource self-supervised pre-training

1 code implementation2 Nov 2022 Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio data that reduces the need for large amounts of labeled data for audio and speech classification.

Audio Classification Clustering +3

MAST: Multiscale Audio Spectrogram Transformers

1 code implementation2 Nov 2022 Sreyan Ghosh, Ashish Seth, S. Umesh, Dinesh Manocha

We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST).

Audio Classification Keyword Spotting +1

data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

1 code implementation2 Nov 2022 Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh

In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for speech representation learning from unlabeled speech data.

Automatic Speech Recognition (ASR) Representation Learning +1

CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

1 code implementation5 Oct 2022 Vasista Sai Lodagala, Sreyan Ghosh, S. Umesh

While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered.

Automatic Speech Recognition (ASR) Clustering +2

PADA: Pruning Assisted Domain Adaptation for Self-Supervised Speech Representations

1 code implementation31 Mar 2022 Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh

To alleviate this issue, we propose PADA (Pruning Assisted Domain Adaptation) and zero out redundant weights from models pre-trained on large amounts of out-of-domain (OOD) data.

Domain Adaptation Language Modelling +1

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

no code implementations31 Mar 2022 Ashish Seth, Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh

Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Span Classification with Structured Information for Disfluency Detection in Spoken Utterances

1 code implementation30 Mar 2022 Sreyan Ghosh, Sonal Kumar, Yaman Kumar Singla, Rajiv Ratn Shah, S. Umesh

Existing approaches in disfluency detection focus on solving a token-level classification task for identifying and removing disfluencies in text.

Classification

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning

1 code implementation25 Mar 2022 Sreyan Ghosh, Ashish Seth, and Deepak Mittal, Maneesh Singh, S. Umesh

Inspired by the recent progress in self-supervised learning for computer vision, in this paper we introduce DeLoRes, a new general-purpose audio representation learning approach.

Representation Learning Self-Supervised Learning +1

Domain Adaptation of low-resource Target-Domain models using well-trained ASR Conformer Models

no code implementations18 Feb 2022 Vrunda N. Sukhadia, S. Umesh

We, therefore, propose to use the embeddings tapped from these encoder layers as features for a downstream Conformer target-domain model and show that they provide significant improvements.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

DECAR: Deep Clustering for learning general-purpose Audio Representations

1 code implementation17 Oct 2021 Sreyan Ghosh, Sandesh V Katta, Ashish Seth, S. Umesh

We introduce DECAR, a self-supervised pre-training approach for learning general-purpose audio representations.

Clustering Deep Clustering +2

DeToxy: A Large-Scale Multimodal Dataset for Toxicity Classification in Spoken Utterances

1 code implementation14 Oct 2021 Sreyan Ghosh, Samden Lepcha, S Sakshi, Rajiv Ratn Shah, S. Umesh

We believe that our dataset would act as a benchmark for the relatively new and un-explored Spoken Language Processing task of detecting toxicity from spoken utterances and boost further research in this space.

Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition

no code implementations15 Jul 2013 D. S. Pavan Kumar, N. Vishnu Prasad, Vikas Joshi, S. Umesh

In this paper, a modification to the training process of the popular SPLICE algorithm has been proposed for noise robust speech recognition.

Robust Speech Recognition speech-recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.