Search Results for author: Abhinav Garg

Found 9 papers, 0 papers with code

end-to-end training of a large vocabulary end-to-end speech recognition system

no code implementations • 22 Dec 2019 • Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda

Our end-to-end speech recognition system built using this training infrastructure showed a 2. 44 % WER on test-clean of the LibriSpeech test set after applying shallow fusion with a Transformer language model (LM).

Data Augmentation Language Modelling +2

Paper
Add Code

Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models

no code implementations • 28 Dec 2019 • Abhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim

In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models.

Language Modelling Multi-Task Learning

Paper
Add Code

Knowledge synthesis from 100 million biomedical documents augments the deep expression profiling of coronavirus receptors

no code implementations • 28 Mar 2020 • AJ Venkatakrishnan, Arjun Puranik, Akash Anand, David Zemmour, Xiang Yao, Xiaoying Wu, Ramakrishna Chilaka, Dariusz K. Murakowski, Kristopher Standish, Bharathwaj Raghunathan, Tyler Wagner, Enrique Garcia-Rivera, Hugo Solomon, Abhinav Garg, Rakesh Barve, Anuli Anyanwu-Ofili, Najat Khan, Venky Soundararajan

The COVID-19 pandemic demands assimilation of all available biomedical knowledge to decode its mechanisms of pathogenicity and transmission.

Paper
Add Code

A review of on-device fully neural end-to-end automatic speech recognition algorithms

no code implementations • 14 Dec 2020 • Chanwoo Kim, Dhananjaya Gowda, Dongsoo Lee, Jiyeon Kim, Ankur Kumar, Sungsoo Kim, Abhinav Garg, Changwoo Han

Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite State Transducer (WFST), and so on.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Streaming end-to-end speech recognition with jointly trained neural feature enhancement

no code implementations • 4 May 2021 • Chanwoo Kim, Abhinav Garg, Dhananjaya Gowda, Seongkyu Mun, Changwoo Han

In this paper, we present a streaming end-to-end speech recognition model based on Monotonic Chunkwise Attention (MoCha) jointly trained with enhancement layers.

speech-recognition Speech Recognition

Paper
Add Code

Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages

no code implementations • 19 Nov 2021 • Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim

To improve the accuracy of a low-resource Italian ASR, we leverage a well-trained English model, unlabeled text corpus, and unlabeled audio corpus using transfer learning, TTS augmentation, and SSL respectively.

Data Augmentation speech-recognition +2

Paper
Add Code

A comparison of streaming models and data augmentation methods for robust speech recognition

no code implementations • 19 Nov 2021 • Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim

However, we observe that training of MoChA models seems to be more sensitive to various factors such as the characteristics of training sets and the incorporation of additional augmentations techniques.

Data Augmentation Robust Speech Recognition +1

Paper
Add Code

Distribution Shift in Airline Customer Behavior during COVID-19

no code implementations • 29 Nov 2021 • Abhinav Garg, Naman Shukla, Lavanya Marla, Sriram Somanchi

Traditional AI approaches in customized (personalized) contextual pricing applications assume that the data distribution at the time of online pricing is similar to that observed during training.

Paper
Add Code

Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech

no code implementations • 19 Jan 2024 • Abhinav Garg, Jiyeon Kim, Sushil Khyalia, Chanwoo Kim, Dhananjaya Gowda

Grapheme-to-Phoneme (G2P) is an essential first step in any modern, high-quality Text-to-Speech (TTS) system.

Self-Supervised Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.