Search Results for author: Tae Jin Park

Found 9 papers, 1 papers with code

The CHiME-7 Challenge: System Description and Performance of NeMo Team's DASR System

no code implementations18 Oct 2023 Tae Jin Park, He Huang, Ante Jukic, Kunal Dhawan, Krishna C. Puvvada, Nithin Koluguri, Nikolay Karpov, Aleksandr Laptev, Jagadeesh Balam, Boris Ginsburg

We present the NVIDIA NeMo team's multi-channel speech recognition system for the 7th CHiME Challenge Distant Automatic Speech Recognition (DASR) Task, focusing on the development of a multi-channel, multi-speaker speech recognition system tailored to transcribe speech from distributed microphones and microphone arrays.

Automatic Speech Recognition speaker-diarization +3

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

no code implementations11 Sep 2023 Tae Jin Park, Kunal Dhawan, Nithin Koluguri, Jagadeesh Balam

In addition, these findings point to the potential of using LLMs to improve speaker diarization and other speech processing tasks by capturing semantic and contextual cues.

speaker-diarization Speaker Diarization

Multi-scale Speaker Diarization with Dynamic Scale Weighting

no code implementations30 Mar 2022 Tae Jin Park, Nithin Rao Koluguri, Jagadeesh Balam, Boris Ginsburg

First, we use multi-scale clustering as an initialization to estimate the number of speakers and obtain the average speaker representation vector for each speaker and each scale.

Decoder speaker-diarization +1

Tackling Dynamics in Federated Incremental Learning with Variational Embedding Rehearsal

no code implementations19 Oct 2021 Tae Jin Park, Kenichi Kumatani, Dimitrios Dimitriadis

Federated Learning is a fast growing area of ML where the training datasets are extremely distributed, all while dynamically changing over time.

Federated Learning Incremental Learning

A Review of Speaker Diarization: Recent Advances with Deep Learning

no code implementations24 Jan 2021 Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe, Shrikanth Narayanan

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when".

Retrieval speaker-diarization +3

Speaker Diarization with Lexical Information

no code implementations13 Apr 2020 Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bo-Wen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap

1 code implementation5 Mar 2020 Tae Jin Park, Kyu J. Han, Manoj Kumar, Shrikanth Narayanan

In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization.

 Ranked #1 on Speaker Diarization on CALLHOME (DER(ig olp) metric)

Clustering speaker-diarization +1

Speaker Diarization With Lexical Information

no code implementations27 Nov 2018 Tae Jin Park, Kyu Han, Ian Lane, Panayiotis Georgiou

This work presents a novel approach to leverage lexical information for speaker diarization.

Clustering speaker-diarization +1

Cannot find the paper you are looking for? You can Submit a new open access paper.