Search Results for author: Suyoun Kim

Found 20 papers, 3 papers with code

Augmenting text for spoken language understanding with Large Language Models

no code implementations17 Sep 2023 Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer

Using the generated text with JAT and TTS for spoken semantic parsing improves EM on STOP by 1. 4% and 2. 6% absolute for existing and new domains respectively.

Semantic Parsing Spoken Language Understanding

Modality Confidence Aware Training for Robust End-to-End Spoken Language Understanding

no code implementations22 Jul 2023 Suyoun Kim, Akshat Shrivastava, Duc Le, Ju Lin, Ozlem Kalinli, Michael L. Seltzer

End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic parse from speech have become more promising recently.

speech-recognition Speech Recognition +1

Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition

no code implementations31 Oct 2022 Suyoun Kim, Ke Li, Lucas Kabela, Rongqing Huang, Jiedan Zhu, Ozlem Kalinli, Duc Le

In this work, we present our Joint Audio/Text training method for Transformer Rescorer, to leverage unpaired text-only data which is relatively cheaper than paired audio-text data.

speech-recognition Speech Recognition

Deliberation Model for On-Device Spoken Language Understanding

no code implementations4 Apr 2022 Duc Le, Akshat Shrivastava, Paden Tomasello, Suyoun Kim, Aleksandr Livshits, Ozlem Kalinli, Michael L. Seltzer

We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU), where a streaming automatic speech recognition (ASR) model produces the first-pass hypothesis and a second-pass natural language understanding (NLU) component generates the semantic parse by conditioning on both ASR's text and audio embeddings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Improving RNN Transducer Based ASR with Auxiliary Tasks

1 code implementation5 Nov 2020 Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig

End-to-end automatic speech recognition (ASR) models with a single neural network have recently demonstrated state-of-the-art results compared to conventional hybrid speech recognizers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer

no code implementations26 Oct 2020 Suyoun Kim, Yuan Shangguan, Jay Mahadeokar, Antoine Bruguier, Christian Fuegen, Michael L. Seltzer, Duc Le

Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech recognition model architectures, has an implicit neural network language model (NNLM) and cannot easily leverage unpaired text data during training.

Language Modelling speech-recognition +1

Cross-Attention End-to-End ASR for Two-Party Conversations

no code implementations24 Jul 2019 Suyoun Kim, Siddharth Dalmia, Florian Metze

We present an end-to-end speech recognition model that learns interaction between two speakers based on the turn-changing information.

speech-recognition Speech Recognition +1

Gated Embeddings in End-to-End Speech Recognition for Conversational-Context Fusion

no code implementations ACL 2019 Suyoun Kim, Siddharth Dalmia, Florian Metze

We present a novel conversational-context aware end-to-end speech recognizer based on a gated neural network that incorporates conversational-context/word/speech embeddings.

Sentence Sentence Embeddings +2

Acoustic-to-Word Models with Conversational Context Information

no code implementations NAACL 2019 Suyoun Kim, Florian Metze

Conversational context information, higher-level knowledge that spans across sentences, can help to recognize a long conversation.

Sentence speech-recognition +1

Dialog-context aware end-to-end speech recognition

no code implementations7 Aug 2018 Suyoun Kim, Florian Metze

Existing speech recognition systems are typically built at the sentence level, although it is known that dialog context, e. g. higher-level knowledge that spans across sentences or speakers, can help the processing of long conversations.

Sentence speech-recognition +1

Towards Language-Universal End-to-End Speech Recognition

no code implementations6 Nov 2017 Suyoun Kim, Michael L. Seltzer

Building speech recognizers in multiple languages typically involves replicating a monolingual training recipe for each language, or utilizing a multi-task learning approach where models for different languages have separate output labels but share some internal parameters.

Multi-Task Learning speech-recognition +1

Improved training for online end-to-end speech recognition systems

1 code implementation6 Nov 2017 Suyoun Kim, Michael L. Seltzer, Jinyu Li, Rui Zhao

Achieving high accuracy with end-to-end speech recognizers requires careful parameter initialization prior to training.

speech-recognition Speech Recognition

Joint CTC-Attention based End-to-End Speech Recognition using Multi-task Learning

8 code implementations21 Sep 2016 Suyoun Kim, Takaaki Hori, Shinji Watanabe

Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments.

Multi-Task Learning Speech Recognition

Environmental Noise Embeddings for Robust Speech Recognition

no code implementations11 Jan 2016 Suyoun Kim, Bhiksha Raj, Ian Lane

We propose a novel deep neural network architecture for speech recognition that explicitly employs knowledge of the background environmental noise within a deep neural network acoustic model.

Management Multi-Task Learning +2

Recurrent Models for Auditory Attention in Multi-Microphone Distance Speech Recognition

no code implementations19 Nov 2015 Suyoun Kim, Ian Lane

Integration of multiple microphone data is one of the key ways to achieve robust speech recognition in noisy environments or when the speaker is located at some distance from the input device.

Robust Speech Recognition Speech Enhancement +1

Multimodal Transfer Deep Learning with Applications in Audio-Visual Recognition

no code implementations9 Dec 2014 Seungwhan Moon, Suyoun Kim, Haohan Wang

We propose a transfer deep learning (TDL) framework that can transfer the knowledge obtained from a single-modal neural network to a network with a different modality.

Video Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.