Search Results for author: Nithin Rao Koluguri

Found 12 papers, 2 papers with code

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

1 code implementation10 Sep 2024 Taejin Park, Ivan Medennikov, Kunal Dhawan, Weiqing Wang, He Huang, Nithin Rao Koluguri, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg

We demonstrate that combining Sort Loss and PIL achieves performance competitive with state-of-the-art end-to-end diarization models trained exclusively with PIL.

speaker-diarization Speaker Diarization

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

no code implementations3 Jul 2024 Kunal Dhawan, Nithin Rao Koluguri, Ante Jukić, Ryan Langman, Jagadeesh Balam, Boris Ginsburg

Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint speech-text foundational models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

BESTOW: Efficient and Streamable Speech Language Model with the Best of Two Worlds in GPT and T5

no code implementations28 Jun 2024 Zhehuai Chen, He Huang, Oleksii Hrinchuk, Krishna C. Puvvada, Nithin Rao Koluguri, Piotr Żelasko, Jagadeesh Balam, Boris Ginsburg

We propose BESTOW architecture to bring the BESt features from TwO Worlds into a single model that is highly efficient and has strong multitask capabilities.

Decoder Language Modelling

Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

no code implementations7 Jun 2024 Ryan Langman, Ante Jukić, Kunal Dhawan, Nithin Rao Koluguri, Boris Ginsburg

Recently, discrete audio tokens produced by neural audio codecs have become a popular alternate speech representation for speech synthesis tasks such as text-to-speech (TTS).

Speech Synthesis Text to Speech

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

no code implementations19 Sep 2023 Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg

Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain.

Language Modelling Quantization +4

A Compact End-to-End Model with Local and Global Context for Spoken Language Identification

no code implementations27 Oct 2022 Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

We introduce TitaNet-LID, a compact end-to-end neural network for Spoken Language Identification (LID) that is based on the ContextNet architecture.

Language Identification Spoken language identification

Multi-scale Speaker Diarization with Dynamic Scale Weighting

no code implementations30 Mar 2022 Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

First, we use multi-scale clustering as an initialization to estimate the number of speakers and obtain the average speaker representation vector for each speaker and each scale.

Decoder speaker-diarization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.