Search Results for author: Suwon Shon

Found 20 papers, 7 papers with code

A Comparative Study on E-Branchformer vs Conformer in Speech Recognition, Translation, and Understanding Tasks

2 code implementations18 May 2023 Yifan Peng, Kwangyoun Kim, Felix Wu, Brian Yan, Siddhant Arora, William Chen, Jiyang Tang, Suwon Shon, Prashant Sridhar, Shinji Watanabe

Conformer, a convolution-augmented Transformer variant, has become the de facto encoder architecture for speech processing due to its superior performance in various tasks, including automatic speech recognition (ASR), speech translation (ST) and spoken language understanding (SLU).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

no code implementations20 Dec 2022 Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.

Dialog Act Classification Question Answering +4

Context-aware Fine-tuning of Self-supervised Speech Models

no code implementations16 Dec 2022 Suwon Shon, Felix Wu, Kwangyoun Kim, Prashant Sridhar, Karen Livescu, Shinji Watanabe

During the fine-tuning stage, we introduce an auxiliary loss that encourages this context embedding vector to be similar to context vectors of surrounding segments.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

On the Use of External Data for Spoken Named Entity Recognition

1 code implementation NAACL 2022 Ankita Pasad, Felix Wu, Suwon Shon, Karen Livescu, Kyu J. Han

In this work we focus on low-resource spoken named entity recognition (NER) and address the question: Beyond self-supervised pre-training, how can we use external speech and/or text data that are not annotated for the task?

Knowledge Distillation named-entity-recognition +6

Leveraging Pre-trained Language Model for Speech Sentiment Analysis

no code implementations11 Jun 2021 Suwon Shon, Pablo Brusco, Jing Pan, Kyu J. Han, Shinji Watanabe

In this paper, we explore the use of pre-trained language models to learn sentiment information of written texts for speech sentiment analysis.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Time-Contrastive Learning Based Deep Bottleneck Features for Text-Dependent Speaker Verification

no code implementations11 May 2019 Achintya kr. Sarkar, Zheng-Hua Tan, Hao Tang, Suwon Shon, James Glass

There are a number of studies about extraction of bottleneck (BN) features from deep neural networks (DNNs)trained to discriminate speakers, pass-phrases and triphone states for improving the performance of text-dependent speaker verification (TD-SV).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

VoiceID Loss: Speech Enhancement for Speaker Verification

no code implementations7 Apr 2019 Suwon Shon, Hao Tang, James Glass

In this paper, we propose VoiceID loss, a novel loss function for training a speech enhancement model to improve the robustness of speaker verification.

Speaker Verification Speech Enhancement

Domain Attentive Fusion for End-to-end Dialect Identification with Unknown Target Domain

no code implementations4 Dec 2018 Suwon Shon, Ahmed Ali, James Glass

An important issue for end-to-end systems is to have some knowledge of the application domain, because the system can be vulnerable to use cases that were not seen in the training phase; such a scenario is often referred to as a domain mismatched condition.

Dialect Identification

Noise-tolerant Audio-visual Online Person Verification using an Attention-based Neural Network Fusion

no code implementations27 Nov 2018 Suwon Shon, Tae-Hyun Oh, James Glass

In this paper, we present a multi-modal online person verification system using both speech and visual signals.

Large-scale Speaker Retrieval on Random Speaker Variability Subspace

no code implementations27 Nov 2018 Suwon Shon, Young-Gun Lee, Taesu Kim

In this paper, we proposed Random Speaker-variability Subspace (RSS) projection to map a data into LSH based hash tables.


Learning pronunciation from a foreign language in speech synthesis networks

2 code implementations23 Nov 2018 Young-Gun Lee, Suwon Shon, Taesu Kim

First, we train the speech synthesis network bilingually in English and Korean and analyze how the network learns the relations of phoneme pronunciation between the languages.

Speech Synthesis

Unsupervised Representation Learning of Speech for Dialect Identification

no code implementations12 Sep 2018 Suwon Shon, Wei-Ning Hsu, James Glass

In this paper, we explore the use of a factorized hierarchical variational autoencoder (FHVAE) model to learn an unsupervised latent representation for dialect identification (DID).

Dialect Identification Disentanglement

Frame-level speaker embeddings for text-independent speaker recognition and analysis of end-to-end model

1 code implementation12 Sep 2018 Suwon Shon, Hao Tang, James Glass

In this paper, we propose a Convolutional Neural Network (CNN) based speaker recognition model for extracting robust speaker embeddings.

Speaker Recognition Text-Independent Speaker Recognition

MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation (MCE) Plan, Dataset and Baseline System

1 code implementation17 Jul 2018 Suwon Shon, Najim Dehak, Douglas Reynolds, James Glass

The Multitarget Challenge aims to assess how well current speech technology is able to determine whether or not a recorded utterance was spoken by one of a large number of 'blacklisted' speakers.

Audio and Speech Processing Sound

Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition

2 code implementations12 Mar 2018 Suwon Shon, Ahmed Ali, James Glass

Although the Siamese network with language embeddings did not achieve as good a result as the end-to-end DID system, the two approaches had good synergy when combined together in a fused system.

Sound Audio and Speech Processing

MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge

no code implementations28 Aug 2017 Suwon Shon, Ahmed Ali, James Glass

In order to achieve a robust ADI system, we explored both Siamese neural network models to learn similarity and dissimilarities among Arabic dialects, as well as i-vector post-processing to adapt domain mismatches.

Arabic Speech Recognition Dialect Identification +2

KU-ISPL Speaker Recognition Systems under Language mismatch condition for NIST 2016 Speaker Recognition Evaluation

no code implementations3 Feb 2017 Suwon Shon, Hanseok Ko

As development dataset which is spoken in Cebuano and Mandarin, we could prepare the evaluation trials through preliminary experiments to compensate the language mismatched condition.

Clustering Speaker Recognition

KU-ISPL Language Recognition System for NIST 2015 i-Vector Machine Learning Challenge

no code implementations21 Sep 2016 Suwon Shon, Seongkyu Mun, John H. L. Hansen, Hanseok Ko

The experimental results show that the use of duration and score fusion improves language recognition performance by 5% relative in LRiMLC15 cost.

BIG-bench Machine Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.