Search Results for author: Kunal Dhawan

Found 20 papers, 7 papers with code

Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens

1 code implementation10 Sep 2024 Taejin Park, Ivan Medennikov, Kunal Dhawan, Weiqing Wang, He Huang, Nithin Rao Koluguri, Krishna C. Puvvada, Jagadeesh Balam, Boris Ginsburg

We demonstrate that combining Sort Loss and PIL achieves performance competitive with state-of-the-art end-to-end diarization models trained exclusively with PIL.

speaker-diarization Speaker Diarization

Codec-ASR: Training Performant Automatic Speech Recognition Systems with Discrete Speech Representations

no code implementations3 Jul 2024 Kunal Dhawan, Nithin Rao Koluguri, Ante Jukić, Ryan Langman, Jagadeesh Balam, Boris Ginsburg

Discrete speech representations have garnered recent attention for their efficacy in training transformer-based models for various speech-related tasks such as automatic speech recognition (ASR), translation, speaker verification, and joint speech-text foundational models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Spectral Codecs: Spectrogram-Based Audio Codecs for High Quality Speech Synthesis

no code implementations7 Jun 2024 Ryan Langman, Ante Jukić, Kunal Dhawan, Nithin Rao Koluguri, Boris Ginsburg

Recently, discrete audio tokens produced by neural audio codecs have become a popular alternate speech representation for speech synthesis tasks such as text-to-speech (TTS).

Speech Synthesis text-to-speech +1

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

no code implementations18 Oct 2023 Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays.

Automatic Speech Recognition speaker-diarization +3

Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition

no code implementations19 Sep 2023 Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg

Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain.

Language Modeling Language Modelling +5

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

1 code implementation11 Sep 2023 Tae Jin Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam

In addition, these findings point to the potential of using LLMs to improve speaker diarization and other speech processing tasks by capturing semantic and contextual cues.

speaker-diarization Speaker Diarization

Unified model for code-switching speech recognition and language identification based on a concatenated tokenizer

1 code implementation14 Jun 2023 Kunal Dhawan, Dima Rekesh, Boris Ginsburg

Code-Switching (CS) multilingual Automatic Speech Recognition (ASR) models can transcribe speech containing two or more alternating languages during a conversation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Phonetic Word Embeddings

1 code implementation30 Sep 2021 Rahul Sharma, Kunal Dhawan, Balakrishna Pailla

This work presents a novel methodology for calculating the phonetic similarity between words taking motivation from the human perception of sounds.

Benchmarking Word Embeddings

Joint Language Identification of Code-Switching Speech using Attention based E2E Network

no code implementations15 Jul 2019 Sreeram Ganji, Kunal Dhawan, Kumar Priyadarshi, Rohit Sinha

For the automatic recognition of code-switching speech, the conventional approaches often employ an LID system for detecting the languages present within an utterance.

Language Identification

Cannot find the paper you are looking for? You can Submit a new open access paper.