Search Results for author: Kyu J. Han

Found 13 papers, 6 papers with code

E-Branchformer: Branchformer with Enhanced merging for speech recognition

1 code implementation30 Sep 2022 Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, Shinji Watanabe

Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap

1 code implementation5 Mar 2020 Tae Jin Park, Kyu J. Han, Manoj Kumar, Shrikanth Narayanan

In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization.

 Ranked #1 on Speaker Diarization on CALLHOME (DER(ig olp) metric)

Clustering speaker-diarization +1

State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions

1 code implementation1 Oct 2019 Kyu J. Han, Ramon Prieto, Kaixing Wu, Tao Ma

Self-attention has been a huge success for many downstream tasks in NLP, which led to exploration of applying self-attention to speech problems as well.

speech-recognition Speech Recognition

On the Use of External Data for Spoken Named Entity Recognition

1 code implementation NAACL 2022 Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han

In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task?

Knowledge Distillation named-entity-recognition +6

The CAPIO 2017 Conversational Speech Recognition System

no code implementations29 Dec 2017 Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane

This method was applied with the CallHome training corpus and improved individual system performances by on average 6. 1% (relative) against the CallHome portion of the evaluation set with no performance loss on the Switchboard portion.

Image Classification speech-recognition +1

Speaker Diarization with Lexical Information

no code implementations13 Apr 2020 Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bo-Wen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Multistream CNN for Robust Acoustic Modeling

no code implementations21 May 2020 Kyu J. Han, Jing Pan, Venkata Krishna Naveen Tadala, Tao Ma, Dan Povey

When combined with self-attentive SRU LM rescoring, multistream CNN contributes for ASAPP to achieve the best WER of 1. 75% on test-clean in LibriSpeech.

Data Augmentation speech-recognition +1

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

no code implementations21 May 2020 Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu J. Han, Tao Lei, Tao Ma

In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling.

Data Augmentation Language Modelling +2

A Review of Speaker Diarization: Recent Advances with Deep Learning

no code implementations24 Jan 2021 Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe, Shrikanth Narayanan

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when".

Retrieval speaker-diarization +3

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

no code implementations11 Jun 2021 Suwon Shon, Pablo Brusco, Jing Pan, Kyu J. Han, Shinji Watanabe

In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Multi-mode Transformer Transducer with Stochastic Future Context

no code implementations17 Jun 2021 Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu J. Han, Shinji Watanabe

A Multi-mode ASR model can fulfill various latency requirements during inference -- when a larger latency becomes acceptable, the model can process longer future context to achieve higher accuracy and when a latency budget is not flexible, the model can be less dependent on future context but still achieve reliable accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores

1 code implementation10 Apr 2024 Lucas Goncalves, Prashant Mathur, Chandrashekhar Lavania, Metehan Cekic, Marcello Federico, Kyu J. Han

Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks.

Audio-Visual Synchronization

Cannot find the paper you are looking for? You can Submit a new open access paper.