Search Results for author: Srikanth Ronanki

Found 16 papers, 1 papers with code

Generalized zero-shot audio-to-intent classification

no code implementations4 Nov 2023 Veera Raghavendra Elluru, Devang Kulshreshtha, Rohit Paturi, Sravan Bodapati, Srikanth Ronanki

Our multimodal training approach improves the accuracy of zero-shot intent classification on unseen intents of SLURP by 2. 75% and 18. 2% for the SLURP and internal goal-oriented dialog datasets, respectively, compared to audio-only training.

Classification Goal-Oriented Dialog +5

Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR

no code implementations18 Apr 2023 Xilai Li, Goeric Huybrechts, Srikanth Ronanki, Jeff Farris, Sravan Bodapati

Overall, our proposed model reduces the degradation of the streaming mode over the non-streaming full-contextual model from 41. 7% and 45. 7% to 16. 7% and 26. 2% on the LibriSpeech test-clean and test-other datasets respectively, while improving by a relative 15. 5% WER over the previous state-of-the-art unified model.

speech-recognition Speech Recognition +1

Device Directedness with Contextual Cues for Spoken Dialog Systems

no code implementations23 Nov 2022 Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff

In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting

no code implementations18 Oct 2022 Saket Dingliwal, Monica Sunkara, Sravan Bodapati, Srikanth Ronanki, Jeff Farris, Katrin Kirchhoff

End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently.

speech-recognition Speech Recognition

Adapting Long Context NLM for ASR Rescoring in Conversational Agents

no code implementations21 Apr 2021 Ashish Shenoy, Sravan Bodapati, Monica Sunkara, Srikanth Ronanki, Katrin Kirchhoff

Neural Language Models (NLM), when trained and evaluated with context spanning multiple utterances, have been shown to consistently outperform both conventional n-gram language models and NLMs that use limited context.

intent-classification Intent Classification +2

Multimodal Semi-supervised Learning Framework for Punctuation Prediction in Conversational Speech

no code implementations3 Aug 2020 Monica Sunkara, Srikanth Ronanki, Dhanush Bekal, Sravan Bodapati, Katrin Kirchhoff

Experiments conducted on the Fisher corpus show that our proposed approach achieves ~6-9% and ~3-4% absolute improvement (F1 score) over the baseline BLSTM model on reference transcripts and ASR outputs respectively.

Data Augmentation

Fine-grained robust prosody transfer for single-speaker neural text-to-speech

no code implementations4 Jul 2019 Viacheslav Klimkov, Srikanth Ronanki, Jonas Rohnke, Thomas Drugman

However, when trained on a single-speaker dataset, the conventional prosody transfer systems are not robust enough to speaker variability, especially in the case of a reference signal coming from an unseen speaker.

Effect of data reduction on sequence-to-sequence neural TTS

no code implementations15 Nov 2018 Javier Latorre, Jakub Lachowicz, Jaime Lorenzo-Trueba, Thomas Merritt, Thomas Drugman, Srikanth Ronanki, Klimkov Viacheslav

Recent speech synthesis systems based on sampling from autoregressive neural networks models can generate speech almost undistinguishable from human recordings.

Speech Synthesis

Median-Based Generation of Synthetic Speech Durations using a Non-Parametric Approach

no code implementations22 Aug 2016 Srikanth Ronanki, Oliver Watts, Simon King, Gustav Eje Henter

This paper proposes a new approach to duration modelling for statistical parametric speech synthesis in which a recurrent statistical model is trained to output a phone transition probability at each timestep (acoustic frame).

Speech Synthesis

DNN-based Speech Synthesis for Indian Languages from ASCII text

no code implementations18 Aug 2016 Srikanth Ronanki, Siva Reddy, Bajibabu Bollepalli, Simon King

These methods first convert the ASCII text to a phonetic script, and then learn a Deep Neural Network to synthesize speech from that.

Speech Synthesis Test +1

Cannot find the paper you are looking for? You can Submit a new open access paper.