Search Results for author: Hoirin Kim

Found 21 papers, 6 papers with code

STaR: Distilling Speech Temporal Relation for Lightweight Speech Self-Supervised Learning Models

no code implementations14 Dec 2023 Kangwook Jang, Sungnyun Kim, Hoirin Kim

Albeit great performance of Transformer-based speech selfsupervised learning (SSL) models, their large parameter size and computational cost make them unfavorable to utilize.

Relation Self-Supervised Learning

Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation

1 code implementation19 May 2023 Kangwook Jang, Sungnyun Kim, Se-Young Yun, Hoirin Kim

Transformer-based speech self-supervised learning (SSL) models, such as HuBERT, show surprising performance in various speech processing tasks.

Self-Supervised Learning

AdaMS: Deep Metric Learning with Adaptive Margin and Adaptive Scale for Acoustic Word Discrimination

no code implementations26 Oct 2022 Myunghun Jung, Hoirin Kim

Many recent loss functions in deep metric learning are expressed with logarithmic and exponential forms, and they involve margin and scale as essential hyper-parameters.

Metric Learning

Anti-Spoofing Using Transfer Learning with Variational Information Bottleneck

no code implementations4 Apr 2022 Youngsik Eom, Yeonghyeon Lee, Ji Sub Um, Hoirin Kim

Furthermore, we show that the proposed system improves performance in low-resource and cross-dataset settings of anti-spoofing task significantly, demonstrating that our system is also robust in terms of data size and data distribution.

Speaker Verification Transfer Learning +1

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

no code implementations30 Mar 2022 Myunghun Jung, Hoirin Kim

Acoustic word embeddings (AWEs) are discriminative representations of speech segments, and learned embedding space reflects the phonetic similarity between words.

Metric Learning MULTI-VIEW LEARNING +1

Meta-Learned Confidence for Transductive Few-shot Learning

no code implementations1 Jan 2021 Seong Min Kye, Hae Beom Lee, Hoirin Kim, Sung Ju Hwang

A popular transductive inference technique for few-shot metric-based approaches, is to update the prototype of each class with the mean of the most confident query examples, or confidence-weighted average of all the query samples.

Few-Shot Learning

Learning to Maximize Speech Quality Directly Using MOS Prediction for Neural Text-to-Speech

no code implementations2 Nov 2020 Yeunju Choi, Youngmoon Jung, Youngjoo Suh, Hoirin Kim

Although recent neural text-to-speech (TTS) systems have achieved high-quality speech synthesis, there are cases where a TTS system generates low-quality speech, mainly caused by limited training data or information loss during knowledge distillation.

Knowledge Distillation Speech Synthesis +1

A Unified Deep Learning Framework for Short-Duration Speaker Verification in Adverse Environments

no code implementations6 Oct 2020 Youngmoon Jung, Yeunju Choi, Hyungjun Lim, Hoirin Kim

At the same time, there is an increasing requirement for an SV system: it should be robust to short speech segments, especially in noisy and reverberant environments.

Action Detection Activity Detection +2

Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling

no code implementations9 Aug 2020 Yeunju Choi, Youngmoon Jung, Hoirin Kim

While deep learning has made impressive progress in speech synthesis and voice conversion, the assessment of the synthesized speech is still carried out by human participants.

Speech Synthesis Voice Conversion

Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning With Spoofing Detection and Spoofing Type Classification

no code implementations16 Jul 2020 Yeunju Choi, Youngmoon Jung, Hoirin Kim

In this paper, we propose a multi-task learning (MTL) method to improve the performance of a MOS prediction model using the following two auxiliary tasks: spoofing detection (SD) and spoofing type classification (STC).

Multi-Task Learning Voice Conversion

Pitchtron: Towards audiobook generation from ordinary people's voices

1 code implementation Interspeech 2020 Sunghee Jung, Hoirin Kim

To deal with this issue, we propose two models, hard and soft pitchtron and release the toolkit and corpus that we have developed.

Multi-Task Network for Noise-Robust Keyword Spotting and Speaker Verification using CTC-based Soft VAD and Global Query Attention

no code implementations8 May 2020 Myunghun Jung, Youngmoon Jung, Jahyun Goo, Hoirin Kim

Keyword spotting (KWS) and speaker verification (SV) have been studied independently although it is known that acoustic and speaker domains are complementary.

Action Detection Activity Detection +2

Improving Multi-Scale Aggregation Using Feature Pyramid Module for Robust Speaker Verification of Variable-Duration Utterances

no code implementations7 Apr 2020 Youngmoon Jung, Seong Min Kye, Yeunju Choi, Myunghun Jung, Hoirin Kim

In this approach, we obtain a speaker embedding vector by pooling single-scale features that are extracted from the last layer of a speaker feature extractor.

Text-Independent Speaker Verification

Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs

1 code implementation6 Apr 2020 Seong Min Kye, Youngmoon Jung, Hae Beom Lee, Sung Ju Hwang, Hoirin Kim

By combining these two learning schemes, our model outperforms existing state-of-the-art speaker verification models learned with a standard supervised learning framework on short utterance (1-2 seconds) on the VoxCeleb datasets.

Meta-Learning Speaker Identification +2

Dual Attention in Time and Frequency Domain for Voice Activity Detection

1 code implementation27 Mar 2020 Joohyung Lee, Youngmoon Jung, Hoirin Kim

The results show that the focal loss can improve the performance in various imbalance situations compared to the cross entropy loss, a commonly used loss function in VAD.

Action Detection Activity Detection

Meta-Learned Confidence for Few-shot Learning

1 code implementation27 Feb 2020 Seong Min Kye, Hae Beom Lee, Hoirin Kim, Sung Ju Hwang

To tackle this issue, we propose to meta-learn the confidence for each query sample, to assign optimal weights to unlabeled queries such that they improve the model's transductive inference performance on unseen tasks.

Few-Shot Image Classification Few-Shot Learning

Additional Shared Decoder on Siamese Multi-view Encoders for Learning Acoustic Word Embeddings

no code implementations1 Oct 2019 Myunghun Jung, Hyungjun Lim, Jahyun Goo, Youngmoon Jung, Hoirin Kim

Acoustic word embeddings --- fixed-dimensional vector representations of arbitrary-length words --- have attracted increasing interest in query-by-example spoken term detection.

speech-recognition Speech Recognition +1

Self-Adaptive Soft Voice Activity Detection using Deep Neural Networks for Robust Speaker Verification

no code implementations26 Sep 2019 Youngmoon Jung, Yeunju Choi, Hoirin Kim

The first approach is soft VAD, which performs a soft selection of frame-level features extracted from a speaker feature extractor.

Action Detection Activity Detection +2

Learning acoustic word embeddings with phonetically associated triplet network

no code implementations7 Nov 2018 Hyungjun Lim, Younggwan Kim, Youngmoon Jung, Myunghun Jung, Hoirin Kim

Previous researches on acoustic word embeddings used in query-by-example spoken term detection have shown remarkable performance improvements when using a triplet network.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.