no code implementations • 20 Oct 2021 • Ankur Bapna, Yu-An Chung, Nan Wu, Anmol Gulati, Ye Jia, Jonathan H. Clark, Melvin Johnson, Jason Riesa, Alexis Conneau, Yu Zhang
We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech.
2 code implementations • 19 Oct 2021 • Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass
However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.
Ranked #1 on
Spoken Command Recognition
on Speech Command v2
no code implementations • 7 Aug 2021 • Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu
In particular, when compared to published models such as conformer-based wav2vec~2. 0 and HuBERT, our model shows~5\% to~10\% relative WER reduction on the test-clean and test-other subsets.
Ranked #1 on
Speech Recognition
on LibriSpeech test-clean
(using extra training data)
1 code implementation • 5 Apr 2021 • Yuan Gong, Yu-An Chung, James Glass
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.
Ranked #2 on
Time Series
on Speech Commands
1 code implementation • 2 Feb 2021 • Yuan Gong, Yu-An Chung, James Glass
Audio tagging is an active research area and has a wide range of applications.
Ranked #3 on
Audio Tagging
on AudioSet
(using extra training data)
1 code implementation • 1 Nov 2020 • Alexander H. Liu, Yu-An Chung, James Glass
Self-supervised speech representations have been shown to be effective in a variety of speech applications.
no code implementations • 22 Oct 2020 • Yu-An Chung, Yonatan Belinkov, James Glass
We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.
1 code implementation • NAACL 2021 • Yu-An Chung, Chenguang Zhu, Michael Zeng
Besides conducting a self-supervised masked language modeling task on the two individual modules using unpaired speech and text, SPLAT aligns representations from the two modules in a shared latent space using a small amount of paired speech and text.
2 code implementations • 17 May 2020 • Yu-An Chung, Hao Tang, James Glass
Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.
no code implementations • ACL 2020 • Yu-An Chung, James Glass
Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech.
1 code implementation • 29 Feb 2020 • Wei-Hung Weng, Yu-An Chung, Schrasing Tong
In the era of clinical information explosion, a good strategy for clinical text summarization is helpful to improve the clinical workflow.
2 code implementations • 23 Oct 2019 • Yu-An Chung, James Glass
Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.
2 code implementations • 2 Oct 2019 • Peter J. Liu, Yu-An Chung, Jie Ren
We show results for extractive and human baselines to demonstrate a large abstractive gap in performance.
no code implementations • 17 Jun 2019 • Wei Fang, Yu-An Chung, James Glass
For an input text, it is simultaneously passed into BERT and the Tacotron-2 encoder.
5 code implementations • 5 Apr 2019 • Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass
This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.
1 code implementation • 4 Feb 2019 • Wei-Hung Weng, Yu-An Chung, Peter Szolovits
As patients' access to their doctors' clinical notes becomes common, translating professional, clinical jargon to layperson-understandable language is essential to improve patient-clinician communication.
no code implementations • 4 Nov 2018 • Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass
We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language.
no code implementations • 30 Aug 2018 • Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, RJ Skerry-Ryan
We demonstrate that the proposed framework enables Tacotron to generate intelligible speech using less than half an hour of paired training data.
no code implementations • NeurIPS 2018 • Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass
Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision.
Automatic Speech Recognition
Cross-Lingual Word Embeddings
+3
1 code implementation • 23 Mar 2018 • Yu-An Chung, James Glass
In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar.
no code implementations • 22 Nov 2017 • Yu-An Chung, Wei-Hung Weng
Deep neural networks have been investigated in learning latent representations of medical images, yet most of the studies limit their approach in a single supervised convolutional neural network (CNN), which usually rely heavily on a large scale annotated dataset for training.
no code implementations • NAACL 2018 • Yu-An Chung, Hung-Yi Lee, James Glass
Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied.
no code implementations • 5 Nov 2017 • Yu-An Chung, James Glass
In this paper, we propose a novel deep neural network architecture, Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the segments, and are close to other vectors in the embedding space if their corresponding segments are semantically similar.
5 code implementations • 1 Oct 2017 • Yao-Yuan Yang, Shao-Chuan Lee, Yu-An Chung, Tung-En Wu, Si-An Chen, Hsuan-Tien Lin
libact is a Python package designed to make active learning easier for general users.
no code implementations • 16 Nov 2016 • Yu-An Chung, Shao-Wen Yang, Hsuan-Tien Lin
While deep neural networks have succeeded in several visual applications, such as object recognition, detection, and localization, by reaching very high classification accuracies, it is important to note that many real-world applications demand varying costs for different types of misclassification errors, thus requiring cost-sensitive classification algorithms.
no code implementations • 3 Mar 2016 • Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, Lin-shan Lee
The vector representations of fixed dimensionality for words (in text) offered by Word2Vec have been shown to be very useful in many application scenarios, in particular due to the semantic information they carry.
no code implementations • 30 Nov 2015 • Yu-An Chung, Hsuan-Tien Lin, Shao-Wen Yang
Deep learning has been one of the most prominent machine learning techniques nowadays, being the state-of-the-art on a broad range of applications where automatic feature extraction is needed.