Search Results for author: Yu-An Chung

Found 32 papers, 18 papers with code

UnitY: Two-pass Direct Speech-to-speech Translation with Discrete Units

1 code implementation15 Dec 2022 Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino

We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization.

Denoising Speech-to-Speech Translation +3

SSAST: Self-Supervised Audio Spectrogram Transformer

2 code implementations19 Oct 2021 Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

Audio Classification Emotion Recognition +4

W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training

3 code implementations7 Aug 2021 Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu

In particular, when compared to published models such as conformer-based wav2vec~2. 0 and HuBERT, our model shows~5\% to~10\% relative WER reduction on the test-clean and test-other subsets.

 Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)

Contrastive Learning Language Modelling +3

AST: Audio Spectrogram Transformer

3 code implementations5 Apr 2021 Yuan Gong, Yu-An Chung, James Glass

In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.

Audio Classification Audio Tagging +4

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

1 code implementation1 Nov 2020 Alexander H. Liu, Yu-An Chung, James Glass

Self-supervised speech representations have been shown to be effective in a variety of speech applications.

Representation Learning

Similarity Analysis of Self-Supervised Speech Representations

no code implementations22 Oct 2020 Yu-An Chung, Yonatan Belinkov, James Glass

We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.

Representation Learning

SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding

1 code implementation NAACL 2021 Yu-An Chung, Chenguang Zhu, Michael Zeng

Besides conducting a self-supervised masked language modeling task on the two individual modules using unpaired speech and text, SPLAT aligns representations from the two modules in a shared latent space using a small amount of paired speech and text.

Language Modelling Masked Language Modeling +1

Vector-Quantized Autoregressive Predictive Coding

2 code implementations17 May 2020 Yu-An Chung, Hao Tang, James Glass

Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.

Improved Speech Representations with Multi-Target Autoregressive Predictive Coding

no code implementations ACL 2020 Yu-An Chung, James Glass

Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech.

speech-recognition Speech Recognition +1

Clinical Text Summarization with Syntax-Based Negation and Semantic Concept Identification

1 code implementation29 Feb 2020 Wei-Hung Weng, Yu-An Chung, Schrasing Tong

In the era of clinical information explosion, a good strategy for clinical text summarization is helpful to improve the clinical workflow.

Negation Negation Detection +1

Generative Pre-Training for Speech with Autoregressive Predictive Coding

2 code implementations23 Oct 2019 Yu-An Chung, James Glass

Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.

Representation Learning Speaker Identification +4

SummAE: Zero-Shot Abstractive Text Summarization using Length-Agnostic Auto-Encoders

2 code implementations2 Oct 2019 Peter J. Liu, Yu-An Chung, Jie Ren

We show results for extractive and human baselines to demonstrate a large abstractive gap in performance.

Abstractive Text Summarization Denoising +1

An Unsupervised Autoregressive Model for Speech Representation Learning

5 code implementations5 Apr 2019 Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass

This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.

General Classification Representation Learning +1

Unsupervised Clinical Language Translation

1 code implementation4 Feb 2019 Wei-Hung Weng, Yu-An Chung, Peter Szolovits

As patients' access to their doctors' clinical notes becomes common, translating professional, clinical jargon to layperson-understandable language is essential to improve patient-clinician communication.

Clinical Language Translation Representation Learning +3

Towards Unsupervised Speech-to-Text Translation

no code implementations4 Nov 2018 Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language.

Denoising Language Modelling +3

Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis

no code implementations30 Aug 2018 Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, RJ Skerry-Ryan

We demonstrate that the proposed framework enables Tacotron to generate intelligible speech using less than half an hour of paired training data.

Speech Synthesis

Unsupervised Cross-Modal Alignment of Speech and Text Embedding Spaces

no code implementations NeurIPS 2018 Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass

Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

3 code implementations23 Mar 2018 Yu-An Chung, James Glass

In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar.

Learning Word Embeddings Word Similarity

Learning Deep Representations of Medical Images using Siamese CNNs with Application to Content-Based Image Retrieval

no code implementations22 Nov 2017 Yu-An Chung, Wei-Hung Weng

Deep neural networks have been investigated in learning latent representations of medical images, yet most of the studies limit their approach in a single supervised convolutional neural network (CNN), which usually rely heavily on a large scale annotated dataset for training.

Content-Based Image Retrieval Medical Image Retrieval +1

Supervised and Unsupervised Transfer Learning for Question Answering

no code implementations NAACL 2018 Yu-An Chung, Hung-Yi Lee, James Glass

Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied.

Question Answering speech-recognition +3

Learning Word Embeddings from Speech

no code implementations5 Nov 2017 Yu-An Chung, James Glass

In this paper, we propose a novel deep neural network architecture, Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the segments, and are close to other vectors in the embedding space if their corresponding segments are semantically similar.

Learning Word Embeddings Word Similarity

libact: Pool-based Active Learning in Python

5 code implementations1 Oct 2017 Yao-Yuan Yang, Shao-Chuan Lee, Yu-An Chung, Tung-En Wu, Si-An Chen, Hsuan-Tien Lin

libact is a Python package designed to make active learning easier for general users.

Active Learning

Cost-Sensitive Deep Learning with Layer-Wise Cost Estimation

no code implementations16 Nov 2016 Yu-An Chung, Shao-Wen Yang, Hsuan-Tien Lin

While deep neural networks have succeeded in several visual applications, such as object recognition, detection, and localization, by reaching very high classification accuracies, it is important to note that many real-world applications demand varying costs for different types of misclassification errors, thus requiring cost-sensitive classification algorithms.

Classification General Classification +1

Audio Word2Vec: Unsupervised Learning of Audio Segment Representations using Sequence-to-sequence Autoencoder

1 code implementation3 Mar 2016 Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, Lin-shan Lee

The vector representations of fixed dimensionality for words (in text) offered by Word2Vec have been shown to be very useful in many application scenarios, in particular due to the semantic information they carry.

Denoising Dynamic Time Warping

Cost-aware Pre-training for Multiclass Cost-sensitive Deep Learning

no code implementations30 Nov 2015 Yu-An Chung, Hsuan-Tien Lin, Shao-Wen Yang

Deep learning has been one of the most prominent machine learning techniques nowadays, being the state-of-the-art on a broad range of applications where automatic feature extraction is needed.

General Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.