Search Results for author: Sundararajan Srinivasan

Found 8 papers, 1 papers with code

End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation

1 code implementation1 Nov 2023 Juan Zuluaga-Gomez, Zhaocheng Huang, Xing Niu, Rohit Paturi, Sundararajan Srinivasan, Prashant Mathur, Brian Thompson, Marcello Federico

Conventional speech-to-text translation (ST) systems are trained on single-speaker utterances, and they may not generalize to real-life scenarios where the audio contains conversations by multiple speakers.

Automatic Speech Recognition speech-recognition +3

Speaker Diarization of Scripted Audiovisual Content

no code implementations4 Aug 2023 Yogesh Virkar, Brian Thompson, Rohit Paturi, Sundararajan Srinivasan, Marcello Federico

The media localization industry usually requires a verbatim script of the final film or TV production in order to create subtitles or dubbing scripts in a foreign language.

speaker-diarization Speaker Diarization +2

Device Directedness with Contextual Cues for Spoken Dialog Systems

no code implementations23 Nov 2022 Dhanush Bekal, Sundararajan Srinivasan, Sravan Bodapati, Srikanth Ronanki, Katrin Kirchhoff

In this work, we define barge-in verification as a supervised learning task where audio-only information is used to classify user spoken dialogue into true and false barge-ins.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Representation learning through cross-modal conditional teacher-student training for speech emotion recognition

no code implementations30 Nov 2021 Sundararajan Srinivasan, Zhaocheng Huang, Katrin Kirchhoff

To improve the efficacy of our approach, we propose a novel estimate of the quality of the emotion predictions, to condition teacher-student training.

Emotion Classification Representation Learning +1

Speaker-conversation factorial designs for diarization error analysis

no code implementations10 Jun 2021 Scott Seyfarth, Sundararajan Srinivasan, Katrin Kirchhoff

Determining the cause of diarization errors is difficult because speaker voice acoustics and conversation structure co-vary, and the interactions between acoustics, conversational structure, and diarization accuracy are complex.

Clustering speaker-diarization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.