Search Results for author: Yu-An Chung

Found 32 papers, 18 papers with code

Cost-aware Pre-training for Multiclass Cost-sensitive Deep Learning

no code implementations • 30 Nov 2015 • Yu-An Chung, Hsuan-Tien Lin, Shao-Wen Yang

Deep learning has been one of the most prominent machine learning techniques nowadays, being the state-of-the-art on a broad range of applications where automatic feature extraction is needed.

General Classification

Paper
Add Code

Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder

1 code implementation • 3 Mar 2016 • Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, Lin-shan Lee

The vector representations of fixed dimensionality for words (in text) offered by Word2Vec have been shown to be very useful in many application scenarios, in particular due to the semantic information they carry.

Denoising Dynamic Time Warping

Paper
Code

Cost-Sensitive Deep Learning with Layer-Wise Cost Estimation

no code implementations • 16 Nov 2016 • Yu-An Chung, Shao-Wen Yang, Hsuan-Tien Lin

While deep neural networks have succeeded in several visual applications, such as object recognition, detection, and localization, by reaching very high classification accuracies, it is important to note that many real-world applications demand varying costs for different types of misclassification errors, thus requiring cost-sensitive classification algorithms.

Classification General Classification +1

Paper
Add Code

libact: Pool-based Active Learning in Python

5 code implementations • 1 Oct 2017 • Yao-Yuan Yang, Shao-Chuan Lee, Yu-An Chung, Tung-En Wu, Si-An Chen, Hsuan-Tien Lin

libact is a Python package designed to make active learning easier for general users.

Active Learning

773

Paper
Code

Learning Word Embeddings from Speech

no code implementations • 5 Nov 2017 • Yu-An Chung, James Glass

In this paper, we propose a novel deep neural network architecture, Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the segments, and are close to other vectors in the embedding space if their corresponding segments are semantically similar.

Learning Word Embeddings Word Similarity

Paper
Add Code

Supervised and Unsupervised Transfer Learning for Question Answering

no code implementations • NAACL 2018 • Yu-An Chung, Hung-Yi Lee, James Glass

Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied.

Question Answering speech-recognition +2

Paper
Add Code

Learning Deep Representations of Medical Images using Siamese CNNs with Application to Content-Based Image Retrieval

no code implementations • 22 Nov 2017 • Yu-An Chung, Wei-Hung Weng

Deep neural networks have been investigated in learning latent representations of medical images, yet most of the studies limit their approach in a single supervised convolutional neural network (CNN), which usually rely heavily on a large scale annotated dataset for training.

Content-Based Image Retrieval Medical Image Retrieval +1

Paper
Add Code

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

3 code implementations • 23 Mar 2018 • Yu-An Chung, James Glass

In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar.

Learning Word Embeddings Word Similarity

Paper
Code

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces

no code implementations • NeurIPS 2018 • Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis

no code implementations • 30 Aug 2018 • Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, RJ Skerry-Ryan

We demonstrate that the proposed framework enables Tacotron to generate intelligible speech using less than half an hour of paired training data.

Speech Synthesis

Paper
Add Code

Towards Unsupervised Speech-to-Text Translation

no code implementations • 4 Nov 2018 • Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language.

Denoising Language Modelling +3

Paper
Add Code

Unsupervised Clinical Language Translation

1 code implementation • 4 Feb 2019 • Wei-Hung Weng, Yu-An Chung, Peter Szolovits

As patients' access to their doctors' clinical notes becomes common, translating professional, clinical jargon to layperson-understandable language is essential to improve patient-clinician communication.

Clinical Language Translation Representation Learning +3

Paper
Code

An Unsupervised Autoregressive Model for Speech Representation Learning

5 code implementations • 5 Apr 2019 • Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.

General Classification Representation Learning +1

184

Paper
Code

Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models

no code implementations • 17 Jun 2019 • Wei Fang, Yu-An Chung, James Glass

For an input text, it is simultaneously passed into BERT and the Tacotron-2 encoder.

Speech Synthesis Transfer Learning

Paper
Add Code

SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders

2 code implementations • 2 Oct 2019 • Peter J. Liu, Yu-An Chung, Jie Ren

We show results for extractive and human baselines to demonstrate a large abstractive gap in performance.

Abstractive Text Summarization Denoising +1

32,783

Paper
Code

Generative Pre-Training for Speech with Autoregressive Predictive Coding

2 code implementations • 23 Oct 2019 • Yu-An Chung, James Glass

Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.

Representation Learning Speaker Identification +4

184

Paper
Code

Clinical Text Summarization with Syntax-Based Negation and Semantic Concept Identification

1 code implementation • 29 Feb 2020 • Wei-Hung Weng, Yu-An Chung, Schrasing Tong

In the era of clinical information explosion, a good strategy for clinical text summarization is helpful to improve the clinical workflow.

Negation Negation Detection +1

Paper
Code

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

no code implementations • ACL 2020 • Yu-An Chung, James Glass

Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech.

speech-recognition Speech Recognition +1

Paper
Add Code

Vector-Quantized Autoregressive Predictive Coding

2 code implementations • 17 May 2020 • Yu-An Chung, Hao Tang, James Glass

Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.

Paper
Code

SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding

1 code implementation • NAACL 2021 • Yu-An Chung, Chenguang Zhu, Michael Zeng

Besides conducting a self-supervised masked language modeling task on the two individual modules using unpaired speech and text, SPLAT aligns representations from the two modules in a shared latent space using a small amount of paired speech and text.

Language Modelling Masked Language Modeling +1

Paper
Code

Similarity Analysis of Self-Supervised Speech Representations

no code implementations • 22 Oct 2020 • Yu-An Chung, Yonatan Belinkov, James Glass

We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.

Representation Learning

Paper
Add Code

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

1 code implementation • 1 Nov 2020 • Alexander H. Liu, Yu-An Chung, James Glass

Self-supervised speech representations have been shown to be effective in a variety of speech applications.

Representation Learning

Paper
Code

PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation

1 code implementation • 2 Feb 2021 • Yuan Gong, Yu-An Chung, James Glass

Audio tagging is an active research area and has a wide range of applications.

Ranked #6 on Audio Classification on FSD50K (using extra training data)

Audio Classification Audio Tagging +2

124

Paper
Code

AST: Audio Spectrogram Transformer

3 code implementations • 5 Apr 2021 • Yuan Gong, Yu-An Chung, James Glass

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.

Ranked #1 on Audio Classification on Speech Commands

Audio Classification Audio Tagging +4

1,003

Paper
Code

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

3 code implementations • 7 Aug 2021 • Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu

In particular, when compared to published models such as conformer-based wav2vec~2. 0 and HuBERT, our model shows~5\% to~10\% relative WER reduction on the test-clean and test-other subsets.

Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Contrastive Learning Language Modelling +3

29,228

Paper
Code

SSAST: Self-Supervised Audio Spectrogram Transformer

2 code implementations • 19 Oct 2021 • Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

Ranked #1 on Spoken Command Recognition on Speech Command v2

Audio Classification Emotion Recognition +4

1,003

Paper
Code

SLAM: A Unified Encoder for Speech and Language Modeling via Speech-Text Joint Pre-Training

no code implementations • 20 Oct 2021 • Ankur Bapna, Yu-An Chung, Nan Wu, Anmol Gulati, Ye Jia, Jonathan H. Clark, Melvin Johnson, Jason Riesa, Alexis Conneau, Yu Zhang

We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech.

Language Modelling Text Matching +2

Paper
Add Code

Speech-to-Speech Translation For A Real-world Unwritten Language

no code implementations • arXiv 2022 • Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann Lee

We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.

Speech-to-Speech Translation Translation

Paper
Add Code

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

1 code implementation • 15 Dec 2022 • Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino

We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization.

Denoising Speech-to-Speech Translation +3