Search Results for author: Samik Sadhu

Found 7 papers, 1 papers with code

Wav2vec-C: A Self-supervised Model for Speech Representation Learning

no code implementations9 Mar 2021 Samik Sadhu, Di He, Che-Wei Huang, Sri Harish Mallidi, Minhua Wu, Ariya Rastrow, Andreas Stolcke, Jasha Droppo, Roland Maas

However, the quantization process is regularized by an additional consistency network that learns to reconstruct the input features to the wav2vec 2. 0 network from the quantized representations in a way similar to a VQ-VAE model.

Quantization Representation Learning +1

Radically Old Way of Computing Spectra: Applications in End-to-End ASR

2 code implementations25 Mar 2021 Samik Sadhu, Hynek Hermansky

We propose a technique to compute spectrograms using Frequency Domain Linear Prediction (FDLP) that uses all-pole models to fit the squared Hilbert envelope of speech in different frequency sub-bands.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Blind Signal Dereverberation for Machine Speech Recognition

no code implementations30 Sep 2022 Samik Sadhu, Hynek Hermansky

We present a method to remove unknown convolutive noise introduced to speech by reverberations of recording environments, utilizing some amount of training speech data from the reverberant environment, and any available non-reverberant speech data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Stabilized training of joint energy-based models and their practical applications

no code implementations7 Mar 2023 Martin Sustek, Samik Sadhu, Lukas Burget, Hynek Hermansky, Jesus Villalba, Laureano Moro-Velazquez, Najim Dehak

The JEM training relies on "positive examples" (i. e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution $p(x)$ generated by means of Stochastic Gradient Langevin Dynamics (SGLD).

Self-supervised Learning with Speech Modulation Dropout

no code implementations22 Mar 2023 Samik Sadhu, Hynek Hermansky

We show that training a multi-headed self-attention-based deep network to predict deleted, information-dense 2-8 Hz speech modulations over a 1. 5-second section of a speech utterance is an effective way to make machines learn to extract speech modulations using time-domain contextual information.

Automatic Speech Recognition Self-Supervised Learning +2

Cannot find the paper you are looking for? You can Submit a new open access paper.