no code implementations • 19 Sep 2023 • Krishna C. Puvvada, Nithin Rao Koluguri, Kunal Dhawan, Jagadeesh Balam, Boris Ginsburg
Discrete audio representation, aka audio tokenization, has seen renewed interest driven by its potential to facilitate the application of text language modeling approaches in audio domain.
no code implementations • 18 Sep 2023 • Nithin Rao Koluguri, Samuel Kriman, Georgy Zelenfroind, Somshubra Majumdar, Dima Rekesh, Vahid Noroozi, Jagadeesh Balam, Boris Ginsburg
This paper presents an overview and evaluation of some of the end-to-end ASR models on long-form audios.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
no code implementations • 8 May 2023 • Dima Rekesh, Nithin Rao Koluguri, Samuel Kriman, Somshubra Majumdar, Vahid Noroozi, He Huang, Oleksii Hrinchuk, Krishna Puvvada, Ankur Kumar, Jagadeesh Balam, Boris Ginsburg
Conformer-based models have become the dominant end-to-end architecture for speech processing tasks.
no code implementations • 27 Oct 2022 • Fei Jia, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg
We introduce TitaNet-LID, a compact end-to-end neural network for Spoken Language Identification (LID) that is based on the ContextNet architecture.
no code implementations • 30 Mar 2022 • Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg
First, we use multi-scale clustering as an initialization to estimate the number of speakers and obtain the average speaker representation vector for each speaker and each scale.
2 code implementations • 8 Oct 2021 • Nithin Rao Koluguri, Taejin Park, Boris Ginsburg
In this paper, we propose TitaNet, a novel neural network architecture for extracting speaker representations.
Ranked #1 on
Speaker Diarization
on CH109