Search Results for author: Kyu J. Han

Found 11 papers, 3 papers with code

On the Use of External Data for Spoken Named Entity Recognition

no code implementations NAACL 2022 Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han

In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task?

Knowledge Distillation named-entity-recognition +5

Multi-mode Transformer Transducer with Stochastic Future Context

no code implementations17 Jun 2021 Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu J. Han, Shinji Watanabe

A Multi-mode ASR model can fulfill various latency requirements during inference -- when a larger latency becomes acceptable, the model can process longer future context to achieve higher accuracy and when a latency budget is not flexible, the model can be less dependent on future context but still achieve reliable accuracy.

Automatic Speech Recognition speech-recognition

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

no code implementations11 Jun 2021 Suwon Shon, Pablo Brusco, Jing Pan, Kyu J. Han, Shinji Watanabe

In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.

Automatic Speech Recognition Sentiment Analysis +1

A Review of Speaker Diarization: Recent Advances with Deep Learning

no code implementations24 Jan 2021 Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe, Shrikanth Narayanan

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when".

speaker-diarization Speaker Diarization +2

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

no code implementations21 May 2020 Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu J. Han, Tao Lei, Tao Ma

In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling.

Data Augmentation speech-recognition +1

Multistream CNN for Robust Acoustic Modeling

no code implementations21 May 2020 Kyu J. Han, Jing Pan, Venkata Krishna Naveen Tadala, Tao Ma, Dan Povey

When combined with self-attentive SRU LM rescoring, multistream CNN contributes for ASAPP to achieve the best WER of 1. 75% on test-clean in LibriSpeech.

Data Augmentation speech-recognition +1

Speaker Diarization with Lexical Information

no code implementations13 Apr 2020 Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bo-Wen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.

Automatic Speech Recognition speaker-diarization +2

Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap

2 code implementations5 Mar 2020 Tae Jin Park, Kyu J. Han, Manoj Kumar, Shrikanth Narayanan

In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization.

speaker-diarization Speaker Diarization

State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions

1 code implementation1 Oct 2019 Kyu J. Han, Ramon Prieto, Kaixing Wu, Tao Ma

Self-attention has been a huge success for many downstream tasks in NLP, which led to exploration of applying self-attention to speech problems as well.

speech-recognition Speech Recognition

The CAPIO 2017 Conversational Speech Recognition System

no code implementations29 Dec 2017 Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane

This method was applied with the CallHome training corpus and improved individual system performances by on average 6. 1% (relative) against the CallHome portion of the evaluation set with no performance loss on the Switchboard portion.

Image Classification speech-recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.