no code implementations • 16 Jan 2024 • Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe
Compared to studies with similar motivations, the proposed loss operates directly on the cross attention weights and is easier to implement.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 15 Dec 2023 • Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu
Considering the recent advances in generative large language models (LLM), we hypothesize that an LLM could generate useful context information using the preceding text.
2 code implementations • 18 May 2023 • Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe
Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 27 Feb 2023 • Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe
Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow.
no code implementations • 16 Dec 2022 • Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe
During the fine-tuning stage, we introduce an auxiliary loss that encourages this context embedding vector to be similar to context vectors of surrounding segments.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
1 code implementation • 30 Sep 2022 • Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, Shinji Watanabe
Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR).
Ranked #9 on Speech Recognition on LibriSpeech test-other
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 17 Jun 2021 • Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu J. Han, Shinji Watanabe
A Multi-mode ASR model can fulfill various latency requirements during inference -- when a larger latency becomes acceptable, the model can process longer future context to achieve higher accuracy and when a latency budget is not flexible, the model can be less dependent on future context but still achieve reliable accuracy.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 29 Nov 2018 • Li Wan, Prashant Sridhar, Yang Yu, Quan Wang, Ignacio Lopez Moreno
In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages.
4 code implementations • 11 Oct 2018 • Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez Moreno
In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker.