1 code implementation • 10 Apr 2024 • Lucas Goncalves, Prashant Mathur, Chandrashekhar Lavania, Metehan Cekic, Marcello Federico, Kyu J. Han
Recent advancements in audio-visual generative modeling have been propelled by progress in deep learning and the availability of data-rich benchmarks.
1 code implementation • 30 Sep 2022 • Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, Shinji Watanabe
Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR).
Ranked #9 on Speech Recognition on LibriSpeech test-other
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • NAACL 2022 • Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han
In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task?
1 code implementation • 19 Nov 2021 • Suwon Shon, Ankita Pasad, Felix Wu, Pablo Brusco, Yoav Artzi, Karen Livescu, Kyu J. Han
Historically these have focused on automatic speech recognition (ASR), speaker identification, or other lower-level tasks.
Ranked #1 on Named Entity Recognition (NER) on SLUE
Automatic Speech Recognition Automatic Speech Recognition (ASR) +7
no code implementations • 17 Jun 2021 • Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu J. Han, Shinji Watanabe
A Multi-mode ASR model can fulfill various latency requirements during inference -- when a larger latency becomes acceptable, the model can process longer future context to achieve higher accuracy and when a latency budget is not flexible, the model can be less dependent on future context but still achieve reliable accuracy.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 11 Jun 2021 • Suwon Shon, Pablo Brusco, Jing Pan, Kyu J. Han, Shinji Watanabe
In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 24 Jan 2021 • Tae Jin Park, Naoyuki Kanda, Dimitrios Dimitriadis, Kyu J. Han, Shinji Watanabe, Shrikanth Narayanan
Speaker diarization is a task to label audio or video recordings with classes that correspond to speaker identity, or in short, a task to identify "who spoke when".
no code implementations • 21 May 2020 • Kyu J. Han, Jing Pan, Venkata Krishna Naveen Tadala, Tao Ma, Dan Povey
When combined with self-attentive SRU LM rescoring, multistream CNN contributes for ASAPP to achieve the best WER of 1. 75% on test-clean in LibriSpeech.
no code implementations • 21 May 2020 • Jing Pan, Joshua Shapiro, Jeremy Wohlwend, Kyu J. Han, Tao Lei, Tao Ma
In this paper we present state-of-the-art (SOTA) performance on the LibriSpeech corpus with two novel neural network architectures, a multistream CNN for acoustic modeling and a self-attentive simple recurrent unit (SRU) for language modeling.
Ranked #7 on Speech Recognition on LibriSpeech test-clean
no code implementations • 13 Apr 2020 • Tae Jin Park, Kyu J. Han, Jing Huang, Xiaodong He, Bo-Wen Zhou, Panayiotis Georgiou, Shrikanth Narayanan
This work presents a novel approach for speaker diarization to leverage lexical information provided by automatic speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 5 Mar 2020 • Tae Jin Park, Kyu J. Han, Manoj Kumar, Shrikanth Narayanan
In this study, we propose a new spectral clustering framework that can auto-tune the parameters of the clustering algorithm in the context of speaker diarization.
Ranked #1 on Speaker Diarization on CALLHOME (DER(ig olp) metric)
1 code implementation • 1 Oct 2019 • Kyu J. Han, Ramon Prieto, Kaixing Wu, Tao Ma
Self-attention has been a huge success for many downstream tasks in NLP, which led to exploration of applying self-attention to speech problems as well.
Ranked #24 on Speech Recognition on LibriSpeech test-clean
no code implementations • 29 Dec 2017 • Kyu J. Han, Akshay Chandrashekaran, Jungsuk Kim, Ian Lane
This method was applied with the CallHome training corpus and improved individual system performances by on average 6. 1% (relative) against the CallHome portion of the evaluation set with no performance loss on the Switchboard portion.