Search Results for author: Chanwoo Kim

Found 27 papers, 4 papers with code

Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution

1 code implementation29 Jan 2024 Ian Covert, Chanwoo Kim, Su-In Lee, James Zou, Tatsunori Hashimoto

Many tasks in explainable machine learning, such as data valuation and feature attribution, perform expensive computation for each data point and can be intractable for large datasets.

Data Valuation

Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech

no code implementations19 Jan 2024 Abhinav Garg, Jiyeon Kim, Sushil Khyalia, Chanwoo Kim, Dhananjaya Gowda

Grapheme-to-Phoneme (G2P) is an essential first step in any modern, high-quality Text-to-Speech (TTS) system.

Self-Supervised Learning

Latent Filling: Latent Space Data Augmentation for Zero-shot Speech Synthesis

no code implementations5 Oct 2023 Jae-Sung Bae, Joun Yeop Lee, Ji-Hyun Lee, Seongkyu Mun, Taehwa Kang, Hoon-Young Cho, Chanwoo Kim

Previous works in zero-shot text-to-speech (ZS-TTS) have attempted to enhance its systems by enlarging the training data through crowd-sourcing or augmenting existing speech data.

Data Augmentation Speech Synthesis

Transformer-Based Unified Recognition of Two Hands Manipulating Objects

1 code implementation CVPR 2023 Hoseong Cho, Chanwoo Kim, Jihyeon Kim, Seongyeong Lee, Elkhan Ismayilzada, Seungryul Baek

In our framework, we insert the whole image depicting two hands, an object and their interactions as input and jointly estimate 3 information from each frame: poses of two hands, pose of an object and object types.

Object

Macro-block dropout for improved regularization in training end-to-end speech recognition models

no code implementations29 Dec 2022 Chanwoo Kim, Sathish Indurti, Jinhwan Park, Wonyong Sung

In our work, we define a macro-block that contains a large number of units from the input to a Recurrent Neural Network (RNN).

speech-recognition Speech Recognition

An Empirical Study on L2 Accents of Cross-lingual Text-to-Speech Systems via Vowel Space

no code implementations6 Nov 2022 JIhwan Lee, Jae-Sung Bae, Seongkyu Mun, Heejin Choi, Joun Yeop Lee, Hoon-Young Cho, Chanwoo Kim

With the recent developments in cross-lingual Text-to-Speech (TTS) systems, L2 (second-language, or foreign) accent problems arise.

Transformer-based Global 3D Hand Pose Estimation in Two Hands Manipulating Objects Scenarios

no code implementations20 Oct 2022 Hoseong Cho, Donguk Kim, Chanwoo Kim, Seongyeong Lee, Seungryul Baek

In this challenge, we aim to estimate global 3D hand poses from the input image where two hands and an object are interacting on the egocentric viewpoint.

3D Hand Pose Estimation

Contrastive Corpus Attribution for Explaining Representations

1 code implementation30 Sep 2022 Chris Lin, Hugh Chen, Chanwoo Kim, Su-In Lee

To address this, we propose contrastive corpus similarity, a novel and semantically meaningful scalar explanation output based on a reference corpus and a contrasting foil set of samples.

Contrastive Learning Object Localization

Learning to Estimate Shapley Values with Vision Transformers

2 code implementations10 Jun 2022 Ian Covert, Chanwoo Kim, Su-In Lee

Transformers have become a default architecture in computer vision, but understanding what drives their predictions remains a challenging problem.

Into-TTS : Intonation Template Based Prosody Control System

no code implementations4 Apr 2022 JIhwan Lee, Joun Yeop Lee, Heejin Choi, Seongkyu Mun, Sangjun Park, Jae-Sung Bae, Chanwoo Kim

Two proposed modules are added to the end-to-end TTS framework: an intonation predictor and an intonation encoder.

Language Modelling

Two-Pass End-to-End ASR Model Compression

no code implementations8 Jan 2022 Nauman Dawalatabad, Tushar Vatsal, Ashutosh Gupta, Sungsoo Kim, Shatrughan Singh, Dhananjaya Gowda, Chanwoo Kim

With the use of popular transducer-based models, it has become possible to practically deploy streaming speech recognition models on small devices [1].

Knowledge Distillation Model Compression +3

A comparison of streaming models and data augmentation methods for robust speech recognition

no code implementations19 Nov 2021 Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim

However, we observe that training of MoChA models seems to be more sensitive to various factors such as the characteristics of training sets and the incorporation of additional augmentations techniques.

Data Augmentation Robust Speech Recognition +1

Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages

no code implementations19 Nov 2021 Jiyeon Kim, Mehul Kumar, Dhananjaya Gowda, Abhinav Garg, Chanwoo Kim

To improve the accuracy of a low-resource Italian ASR, we leverage a well-trained English model, unlabeled text corpus, and unlabeled audio corpus using transfer learning, TTS augmentation, and SSL respectively.

Data Augmentation speech-recognition +2

Streaming end-to-end speech recognition with jointly trained neural feature enhancement

no code implementations4 May 2021 Chanwoo Kim, Abhinav Garg, Dhananjaya Gowda, Seongkyu Mun, Changwoo Han

In this paper, we present a streaming end-to-end speech recognition model based on Monotonic Chunkwise Attention (MoCha) jointly trained with enhancement layers.

speech-recognition Speech Recognition

Faster Re-translation Using Non-Autoregressive Model For Simultaneous Neural Machine Translation

no code implementations29 Dec 2020 Hyojung Han, Sathish Indurthi, Mohd Abbas Zaidi, Nikhil Kumar Lakumarapu, Beomseok Lee, Sangha Kim, Chanwoo Kim, Inchul Hwang

The current re-translation approaches are based on autoregressive sequence generation models (ReTA), which generate tar-get tokens in the (partial) translation sequentially.

Machine Translation TAR +1

A review of on-device fully neural end-to-end automatic speech recognition algorithms

no code implementations14 Dec 2020 Chanwoo Kim, Dhananjaya Gowda, Dongsoo Lee, Jiyeon Kim, Ankur Kumar, Sungsoo Kim, Abhinav Garg, Changwoo Han

Conventional speech recognition systems comprise a large number of discrete components such as an acoustic model, a language model, a pronunciation model, a text-normalizer, an inverse-text normalizer, a decoder based on a Weighted Finite State Transducer (WFST), and so on.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Small energy masking for improved neural network training for end-to-end speech recognition

no code implementations15 Feb 2020 Chanwoo Kim, Kwangyoun Kim, Sathish Reddy Indurthi

More specifically, a time-frequency bin is masked if the filterbank energy in this bin is less than a certain energy threshold.

speech-recognition Speech Recognition

Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models

no code implementations28 Dec 2019 Abhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim

In this paper, we propose a refined multi-stage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models.

Language Modelling Multi-Task Learning

end-to-end training of a large vocabulary end-to-end speech recognition system

no code implementations22 Dec 2019 Chanwoo Kim, Sungsoo Kim, Kwangyoun Kim, Mehul Kumar, Jiyeon Kim, Kyungmin Lee, Changwoo Han, Abhinav Garg, Eunhyang Kim, Minkyoo Shin, Shatrughan Singh, Larry Heck, Dhananjaya Gowda

Our end-to-end speech recognition system built using this training infrastructure showed a 2. 44 % WER on test-clean of the LibriSpeech test set after applying shallow fusion with a Transformer language model (LM).

Data Augmentation Language Modelling +2

power-law nonlinearity with maximally uniform distribution criterion for improved neural network training in automatic speech recognition

no code implementations22 Dec 2019 Chanwoo Kim, Mehul Kumar, Kwangyoun Kim, Dhananjaya Gowda

With the power function-based MUD, we apply a power-function based nonlinearity where power function coefficients are chosen to maximize the likelihood assuming that nonlinearity outputs follow the uniform distribution.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Data Efficient Direct Speech-to-Text Translation with Modality Agnostic Meta-Learning

no code implementations11 Nov 2019 Sathish Indurthi, Houjeung Han, Nikhil Kumar Lakumarapu, Beomseok Lee, Insoo Chung, Sangha Kim, Chanwoo Kim

In the meta-learning phase, the parameters of the model are exposed to vast amounts of speech transcripts (e. g., English ASR) and text translations (e. g., English-German MT).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Cannot find the paper you are looking for? You can Submit a new open access paper.