Search Results for author: Dhananjaya Gowda

Found 15 papers, 1 papers with code

Time-Varying Quasi-Closed-Phase Analysis for Accurate Formant Tracking in Speech Signals

1 code implementation31 Aug 2023 Dhananjaya Gowda, Sudarsana Reddy Kadiri, Brad Story, Paavo Alku

Formant tracking experiments with a wide variety of synthetic and natural speech signals show that the proposed TVQCP method performs better than conventional and popular formant tracking tools, such as Wavesurfer and Praat (based on dynamic programming), the KARMA algorithm (based on Kalman filtering), and DeepFormants (based on deep neural networks trained in a supervised manner).

end-to-end training of a large vocabulary end-to-end speech recognition system

no code implementations22 Dec 2019 Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda

Our end-to-end speech recognition system built using this training infrastructure showed a 2. 44 % WER on test-clean of the LibriSpeech test set after applying shallow fusion with a Transformer language model (LM).

Data Augmentation Language Modelling +2

power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition

no code implementations22 Dec 2019 Chanwoo Kim, Mehul Kumar, Kwangyoun Kim, Dhananjaya Gowda

With the power function-based MUD, we apply a power-function based nonlinearity where power function coefficients are chosen to maximize the likelihood assuming that nonlinearity outputs follow the uniform distribution.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models

no code implementations28 Dec 2019 Abhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim

In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models.

Language Modelling Multi-Task Learning

A review of on-device fully neural end-to-end automatic speech recognition algorithms

no code implementations14 Dec 2020 Chanwoo Kim, Dhananjaya Gowda, Dongsoo Lee, Jiyeon Kim, Ankur Kumar, Sungsoo Kim, Abhinav Garg, Changwoo Han

Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite State Transducer (WFST), and so on.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Streaming end-to-end speech recognition with jointly trained neural feature enhancement

no code implementations4 May 2021 Chanwoo Kim, Abhinav Garg, Dhananjaya Gowda, Seongkyu Mun, Changwoo Han

In this paper, we present a streaming end-to-end speech recognition model based on Monotonic Chunkwise Attention (MoCha) jointly trained with enhancement layers.

speech-recognition Speech Recognition

Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages

no code implementations19 Nov 2021 Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim

To improve the accuracy of a low-resource Italian ASR, we leverage a well-trained English model, unlabeled text corpus, and unlabeled audio corpus using transfer learning, TTS augmentation, and SSL respectively.

Data Augmentation speech-recognition +2

A comparison of streaming models and data augmentation methods for robust speech recognition

no code implementations19 Nov 2021 Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim

However, we observe that training of MoChA models seems to be more sensitive to various factors such as the characteristics of training sets and the incorporation of additional augmentations techniques.

Data Augmentation Robust Speech Recognition +1

Formant Tracking Using Quasi-Closed Phase Forward-Backward Linear Prediction Analysis and Deep Neural Networks

no code implementations5 Jan 2022 Dhananjaya Gowda, Bajibabu Bollepalli, Sudarsana Reddy Kadiri, Paavo Alku

Formant tracking is investigated in this study by using trackers based on dynamic programming (DP) and deep neural nets (DNNs).

Two-Pass End-to-End ASR Model Compression

no code implementations8 Jan 2022 Nauman Dawalatabad, Tushar Vatsal, Ashutosh Gupta, Sungsoo Kim, Shatrughan Singh, Dhananjaya Gowda, Chanwoo Kim

With the use of popular transducer-based models, it has become possible to practically deploy streaming speech recognition models on small devices [1].

Knowledge Distillation Model Compression +3

Multi-stage Progressive Compression of Conformer Transducer for On-device Speech Recognition

no code implementations1 Oct 2022 Jash Rathod, Nauman Dawalatabad, Shatrughan Singh, Dhananjaya Gowda

Knowledge distillation (KD) is a popular model compression approach that has shown to achieve smaller model size with relatively lesser degradation in the model performance.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Refining a Deep Learning-based Formant Tracker using Linear Prediction Methods

no code implementations17 Aug 2023 Paavo Alku, Sudarsana Reddy Kadiri, Dhananjaya Gowda

The results indicated that the data-driven DeepFormants trackers outperformed the conventional trackers and that the best performance was obtained by refining the formants predicted by DeepFormants using QCP-FB analysis.

Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech

no code implementations19 Jan 2024 Abhinav Garg, Jiyeon Kim, Sushil Khyalia, Chanwoo Kim, Dhananjaya Gowda

Grapheme-to-Phoneme (G2P) is an essential first step in any modern, high-quality Text-to-Speech (TTS) system.

Self-Supervised Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.