Search Results for author: Kyu J. Han

Found 13 papers, 6 papers with code

PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores

1 code implementation • 10 Apr 2024 • Lucas Goncalves, Prashant Mathur, Chandrashekhar Lavania, Metehan Cekic, Marcello Federico, Kyu J. Han

Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks.

Audio-Visual Synchronization

Paper
Code

E-Branchformer: Branchformer with Enhanced merging for speech recognition

1 code implementation • 30 Sep 2022 • Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, Shinji Watanabe

Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR).

Ranked #9 on Speech Recognition on LibriSpeech test-other

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

7,871

Paper
Code

On the Use of External Data for Spoken Named Entity Recognition

1 code implementation • NAACL 2022 • Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han

In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task?

Knowledge Distillation named-entity-recognition +6

Paper
Code

SLUE: New Benchmark Tasks for Spoken Language Understanding Evaluation on Natural Speech

1 code implementation • 19 Nov 2021 • Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han

Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks.

Ranked #1 on Named Entity Recognition (NER) on SLUE

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Code

Multi-mode Transformer Transducer with Stochastic Future Context

no code implementations • 17 Jun 2021 • Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu J. Han, Shinji Watanabe

A Multi-mode ASR model can fulfill various latency requirements during inference -- when a larger latency becomes acceptable, the model can process longer future context to achieve higher accuracy and when a latency budget is not flexible, the model can be less dependent on future context but still achieve reliable accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

no code implementations • 11 Jun 2021 • Suwon Shon, Pablo Brusco, Jing Pan, Kyu J. Han, Shinji Watanabe

In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

A Review of Speaker Diarization: Recent Advances with Deep Learning

no code implementations • 24 Jan 2021 • Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe, Shrikanth Narayanan

Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when".

Retrieval speaker-diarization +3

Paper
Add Code

Multistream CNN for Robust Acoustic Modeling

no code implementations • 21 May 2020 • Kyu J. Han, Jing Pan, Venkata Krishna Naveen Tadala, Tao Ma, Dan Povey

When combined with self-attentive SRU LM rescoring, multistream CNN contributes for ASAPP to achieve the best WER of 1. 75% on test-clean in LibriSpeech.

Data Augmentation speech-recognition +1

Paper
Add Code

ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition

no code implementations • 21 May 2020 • Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu J. Han, Tao Lei, Tao Ma

In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling.

Ranked #7 on Speech Recognition on LibriSpeech test-clean

Data Augmentation Language Modelling +2

Paper
Add Code

Speaker Diarization with Lexical Information

no code implementations • 13 Apr 2020 • Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bo-Wen Zhou, Panayiotis Georgiou, Shrikanth Narayanan

This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Auto-Tuning Spectral Clustering for Speaker Diarization Using Normalized Maximum Eigengap

1 code implementation • 5 Mar 2020 • Tae Jin Park, Kyu J. Han, Manoj Kumar, Shrikanth Narayanan

In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization.

Ranked #1 on Speaker Diarization on CALLHOME (DER(ig olp) metric)

Clustering speaker-diarization +1

Paper
Code

State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions

1 code implementation • 1 Oct 2019 • Kyu J. Han, Ramon Prieto, Kaixing Wu, Tao Ma

Self-attention has been a huge success for many downstream tasks in NLP, which led to exploration of applying self-attention to speech problems as well.

Ranked #24 on Speech Recognition on LibriSpeech test-clean

speech-recognition Speech Recognition

Paper
Code

The CAPIO 2017 Conversational Speech Recognition System

no code implementations • 29 Dec 2017 • Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane

This method was applied with the CallHome training corpus and improved individual system performances by on average 6. 1% (relative) against the CallHome portion of the evaluation set with no performance loss on the Switchboard portion.

Image Classification speech-recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.