Search Results for author: Kwangyoun Kim

Found 16 papers, 6 papers with code

Improving ASR Contextual Biasing with Guided Attention

no code implementations16 Jan 2024 Jiyang Tang, Kwangyoun Kim, Suwon Shon, Felix Wu, Prashant Sridhar, Shinji Watanabe

Compared to studies with similar motivations, the proposed loss operates directly on the cross attention weights and is easier to implement.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Generative Context-aware Fine-tuning of Self-supervised Speech Models

no code implementations15 Dec 2023 Suwon Shon, Kwangyoun Kim, Prashant Sridhar, Yi-Te Hsu, Shinji Watanabe, Karen Livescu

Considering the recent advances in generative large language models (LLM), we hypothesize that an LLM could generate useful context information using the preceding text.

Automatic Speech Recognition named-entity-recognition +6

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

2 code implementations18 May 2023 Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe

Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Structured Pruning of Self-Supervised Pre-trained Models for Speech Recognition and Understanding

1 code implementation27 Feb 2023 Yifan Peng, Kwangyoun Kim, Felix Wu, Prashant Sridhar, Shinji Watanabe

Self-supervised speech representation learning (SSL) has shown to be effective in various downstream tasks, but SSL models are usually large and slow.

Model Compression Representation Learning +2

Context-aware Fine-tuning of Self-supervised Speech Models

no code implementations16 Dec 2022 Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe

During the fine-tuning stage, we introduce an auxiliary loss that encourages this context embedding vector to be similar to context vectors of surrounding segments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

E-Branchformer: Branchformer with Enhanced merging for speech recognition

1 code implementation30 Sep 2022 Kwangyoun Kim, Felix Wu, Yifan Peng, Jing Pan, Prashant Sridhar, Kyu J. Han, Shinji Watanabe

Conformer, combining convolution and self-attention sequentially to capture both local and global information, has shown remarkable performance and is currently regarded as the state-of-the-art for automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

SRU++: Pioneering Fast Recurrence with Attention for Speech Recognition

no code implementations11 Oct 2021 Jing Pan, Tao Lei, Kwangyoun Kim, Kyu Han, Shinji Watanabe

The Transformer architecture has been well adopted as a dominant architecture in most sequence transduction tasks including automatic speech recognition (ASR), since its attention mechanism excels in capturing long-range dependencies.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Multi-mode Transformer Transducer with Stochastic Future Context

no code implementations17 Jun 2021 Kwangyoun Kim, Felix Wu, Prashant Sridhar, Kyu J. Han, Shinji Watanabe

A Multi-mode ASR model can fulfill various latency requirements during inference -- when a larger latency becomes acceptable, the model can process longer future context to achieve higher accuracy and when a latency budget is not flexible, the model can be less dependent on future context but still achieve reliable accuracy.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Small energy masking for improved neural network training for end-to-end speech recognition

no code implementations15 Feb 2020 Chanwoo Kim, Kwangyoun Kim, Sathish Reddy Indurthi

More specifically, a time-frequency bin is masked if the filterbank energy in this bin is less than a certain energy threshold.

speech-recognition Speech Recognition

Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models

no code implementations28 Dec 2019 Abhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim

In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models.

Language Modelling Multi-Task Learning

power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition

no code implementations22 Dec 2019 Chanwoo Kim, Mehul Kumar, Kwangyoun Kim, Dhananjaya Gowda

With the power function-based MUD, we apply a power-function based nonlinearity where power function coefficients are chosen to maximize the likelihood assuming that nonlinearity outputs follow the uniform distribution.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

end-to-end training of a large vocabulary end-to-end speech recognition system

no code implementations22 Dec 2019 Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda

Our end-to-end speech recognition system built using this training infrastructure showed a 2. 44 % WER on test-clean of the LibriSpeech test set after applying shallow fusion with a Transformer language model (LM).

Data Augmentation Language Modelling +2

Cannot find the paper you are looking for? You can Submit a new open access paper.