Search Results for author: Youngmoon Jung

Found 13 papers, 3 papers with code

Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech

no code implementations2 Nov 2020 Yeunju Choi, Youngmoon Jung, Youngjoo Suh, Hoirin Kim

Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited training data or information loss during knowledge distillation.

Knowledge Distillation Speech Synthesis +1

A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments

no code implementations6 Oct 2020 Youngmoon Jung, Yeunju Choi, Hyungjun Lim, Hoirin Kim

At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments, especially in noisy and reverberant environments.

Action Detection Activity Detection +2

Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling

no code implementations9 Aug 2020 Yeunju Choi, Youngmoon Jung, Hoirin Kim

While deep learning has made impressive progress in speech synthesis and voice conversion, the assessment of the synthesized speech is still carried out by human participants.

Speech Synthesis Voice Conversion

Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning With Spoofing Detection and Spoofing Type Classification

no code implementations16 Jul 2020 Yeunju Choi, Youngmoon Jung, Hoirin Kim

In this paper, we propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model using the following two auxiliary tasks: spoofing detection (SD) and spoofing type classification (STC).

Multi-Task Learning Voice Conversion

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

no code implementations8 May 2020 Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim

Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary.

Action Detection Activity Detection +2

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

no code implementations7 Apr 2020 Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim

In this approach, we obtain a speaker embedding vector by pooling single-scale features that are extracted from the last layer of a speaker feature extractor.

Text-Independent Speaker Verification

Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

1 code implementation6 Apr 2020 Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim

By combining these two learning schemes, our model outperforms existing state-of-the-art speaker verification models learned with a standard supervised learning framework on short utterance (1-2 seconds) on the VoxCeleb datasets.

Meta-Learning Speaker Identification +2

Dual Attention in Time and Frequency Domain for Voice Activity Detection

1 code implementation27 Mar 2020 Joohyung Lee, Youngmoon Jung, Hoirin Kim

The results show that the focal loss can improve the performance in various imbalance situations compared to the cross entropy loss, a commonly used loss function in VAD.

Action Detection Activity Detection

Additional Shared Decoder on Siamese Multi-view Encoders for Learning Acoustic Word Embeddings

no code implementations1 Oct 2019 Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim

Acoustic word embeddings --- fixed-dimensional vector representations of arbitrary-length words --- have attracted increasing interest in query-by-example spoken term detection.

speech-recognition Speech Recognition +1

Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification

no code implementations26 Sep 2019 Youngmoon Jung, Yeunju Choi, Hoirin Kim

The first approach is soft VAD, which performs a soft selection of frame-level features extracted from a speaker feature extractor.

Action Detection Activity Detection +2

Learning acoustic word embeddings with phonetically associated triplet network

no code implementations7 Nov 2018 Hyungjun Lim, Younggwan Kim, Youngmoon Jung, Myunghun Jung, Hoirin Kim

Previous researches on acoustic word embeddings used in query-by-example spoken term detection have shown remarkable performance improvements when using a triplet network.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.