Search Results for author: Yeon-Jun Kim

Found 6 papers, 0 papers with code

1SPU: 1-step Speech Processing Unit

no code implementations • 8 Nov 2023 • Karan Singla, Shahab Jalalvand, Yeon-Jun Kim, Antonio Moreno Daniel, Srinivas Bangalore, Andrej Ljolje, Ben Stern

Recent studies have made some progress in refining end-to-end (E2E) speech recognition encoders by applying Connectionist Temporal Classification (CTC) loss to enhance named entity recognition within transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

E2E Spoken Entity Extraction for Virtual Agents

no code implementations • 16 Feb 2023 • Karan Singla, Yeon-Jun Kim, Srinivas Bangalore

In human-computer conversations, extracting entities such as names, street addresses and email addresses from speech is a challenging task.

Paper
Add Code

Cross-stitched Multi-modal Encoders

no code implementations • 20 Apr 2022 • Karan Singla, Daniel Pressel, Ryan Price, Bhargav Srinivas Chinnari, Yeon-Jun Kim, Srinivas Bangalore

In this paper, we propose a novel architecture for multi-modal speech and text input.

Classification

Paper
Add Code

Seq-2-Seq based Refinement of ASR Output for Spoken Name Capture

no code implementations • 29 Mar 2022 • Karan Singla, Shahab Jalalvand, Yeon-Jun Kim, Ryan Price, Daniel Pressel, Srinivas Bangalore

Person name capture from human speech is a difficult task in human-machine conversations.

Paper
Add Code

A Hybrid Approach to Scalable and Robust Spoken Language Understanding in Enterprise Virtual Agents

no code implementations • NAACL 2021 • Ryan Price, Mahnoosh Mehrabani, Narendra Gupta, Yeon-Jun Kim, Shahab Jalalvand, Minhua Chen, Yanjie Zhao, Srinivas Bangalore

Spoken language understanding (SLU) extracts the intended mean- ing from a user utterance and is a critical component of conversational virtual agents.

Spoken Language Understanding

Paper
Add Code

Building Text-To-Speech Voices in the Cloud

no code implementations • LREC 2012 • Alistair Conkie, Thomas Okken, Yeon-Jun Kim, Giuseppe Di Fabbrizio

The AT{\&}T VoiceBuilder provides a new tool to researchers and practitioners who want to have their voices synthesized by a high-quality commercial-grade text-to-speech system without the need to install, configure, or manage speech processing software and equipment. It is implemented as a web service on the AT{\&}T Speech Mashup Portal. The system records and validates users' utterances, processes them to build a synthetic voice and provides a web service API to make the voice available to real-time applications through a scalable cloud-based processing platform.

Speech Recognition Speech Synthesis

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.