no code implementations • 30 Nov 2015 • Yu-An Chung, Hsuan-Tien Lin, Shao-Wen Yang
Deep learning has been one of the most prominent machine learning techniques nowadays, being the state-of-the-art on a broad range of applications where automatic feature extraction is needed.
1 code implementation • 3 Mar 2016 • Yu-An Chung, Chao-Chung Wu, Chia-Hao Shen, Hung-Yi Lee, Lin-shan Lee
The vector representations of fixed dimensionality for words (in text) offered by Word2Vec have been shown to be very useful in many application scenarios, in particular due to the semantic information they carry.
no code implementations • 16 Nov 2016 • Yu-An Chung, Shao-Wen Yang, Hsuan-Tien Lin
While deep neural networks have succeeded in several visual applications, such as object recognition, detection, and localization, by reaching very high classification accuracies, it is important to note that many real-world applications demand varying costs for different types of misclassification errors, thus requiring cost-sensitive classification algorithms.
5 code implementations • 1 Oct 2017 • Yao-Yuan Yang, Shao-Chuan Lee, Yu-An Chung, Tung-En Wu, Si-An Chen, Hsuan-Tien Lin
libact is a Python package designed to make active learning easier for general users.
no code implementations • 5 Nov 2017 • Yu-An Chung, James Glass
In this paper, we propose a novel deep neural network architecture, Sequence-to-Sequence Audio2Vec, for unsupervised learning of fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the segments, and are close to other vectors in the embedding space if their corresponding segments are semantically similar.
no code implementations • NAACL 2018 • Yu-An Chung, Hung-Yi Lee, James Glass
Although transfer learning has been shown to be successful for tasks like object and speech recognition, its applicability to question answering (QA) has yet to be well-studied.
no code implementations • 22 Nov 2017 • Yu-An Chung, Wei-Hung Weng
Deep neural networks have been investigated in learning latent representations of medical images, yet most of the studies limit their approach in a single supervised convolutional neural network (CNN), which usually rely heavily on a large scale annotated dataset for training.
3 code implementations • 23 Mar 2018 • Yu-An Chung, James Glass
In this paper, we propose a novel deep neural network architecture, Speech2Vec, for learning fixed-length vector representations of audio segments excised from a speech corpus, where the vectors contain semantic information pertaining to the underlying spoken words, and are close to other vectors in the embedding space if their corresponding underlying spoken words are semantically similar.
no code implementations • NeurIPS 2018 • Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass
Recent research has shown that word embedding spaces learned from text corpora of different languages can be aligned without any parallel data supervision.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 30 Aug 2018 • Yu-An Chung, Yuxuan Wang, Wei-Ning Hsu, Yu Zhang, RJ Skerry-Ryan
We demonstrate that the proposed framework enables Tacotron to generate intelligible speech using less than half an hour of paired training data.
no code implementations • 4 Nov 2018 • Yu-An Chung, Wei-Hung Weng, Schrasing Tong, James Glass
We present a framework for building speech-to-text translation (ST) systems using only monolingual speech and text corpora, in other words, speech utterances from a source language and independent text from a target language.
1 code implementation • 4 Feb 2019 • Wei-Hung Weng, Yu-An Chung, Peter Szolovits
As patients' access to their doctors' clinical notes becomes common, translating professional, clinical jargon to layperson-understandable language is essential to improve patient-clinician communication.
5 code implementations • 5 Apr 2019 • Yu-An Chung, Wei-Ning Hsu, Hao Tang, James Glass
This paper proposes a novel unsupervised autoregressive neural model for learning generic speech representations.
no code implementations • 17 Jun 2019 • Wei Fang, Yu-An Chung, James Glass
For an input text, it is simultaneously passed into BERT and the Tacotron-2 encoder.
2 code implementations • 2 Oct 2019 • Peter J. Liu, Yu-An Chung, Jie Ren
We show results for extractive and human baselines to demonstrate a large abstractive gap in performance.
2 code implementations • 23 Oct 2019 • Yu-An Chung, James Glass
Learning meaningful and general representations from unannotated speech that are applicable to a wide range of tasks remains challenging.
1 code implementation • 29 Feb 2020 • Wei-Hung Weng, Yu-An Chung, Schrasing Tong
In the era of clinical information explosion, a good strategy for clinical text summarization is helpful to improve the clinical workflow.
no code implementations • ACL 2020 • Yu-An Chung, James Glass
Training objectives based on predictive coding have recently been shown to be very effective at learning meaningful representations from unlabeled speech.
2 code implementations • 17 May 2020 • Yu-An Chung, Hao Tang, James Glass
Autoregressive Predictive Coding (APC), as a self-supervised objective, has enjoyed success in learning representations from large amounts of unlabeled data, and the learned representations are rich for many downstream tasks.
1 code implementation • NAACL 2021 • Yu-An Chung, Chenguang Zhu, Michael Zeng
Besides conducting a self-supervised masked language modeling task on the two individual modules using unpaired speech and text, SPLAT aligns representations from the two modules in a shared latent space using a small amount of paired speech and text.
no code implementations • 22 Oct 2020 • Yu-An Chung, Yonatan Belinkov, James Glass
We also design probing tasks to study the correlation between the models' pre-training loss and the amount of specific speech information contained in their learned representations.
1 code implementation • 1 Nov 2020 • Alexander H. Liu, Yu-An Chung, James Glass
Self-supervised speech representations have been shown to be effective in a variety of speech applications.
1 code implementation • 2 Feb 2021 • Yuan Gong, Yu-An Chung, James Glass
Audio tagging is an active research area and has a wide range of applications.
Ranked #6 on Audio Classification on FSD50K (using extra training data)
4 code implementations • 5 Apr 2021 • Yuan Gong, Yu-An Chung, James Glass
In the past decade, convolutional neural networks (CNNs) have been widely adopted as the main building block for end-to-end audio classification models, which aim to learn a direct mapping from audio spectrograms to corresponding labels.
Ranked #1 on Audio Classification on Speech Commands
3 code implementations • 7 Aug 2021 • Yu-An Chung, Yu Zhang, Wei Han, Chung-Cheng Chiu, James Qin, Ruoming Pang, Yonghui Wu
In particular, when compared to published models such as conformer-based wav2vec~2. 0 and HuBERT, our model shows~5\% to~10\% relative WER reduction on the test-clean and test-other subsets.
Ranked #1 on Speech Recognition on LibriSpeech test-clean (using extra training data)
2 code implementations • 19 Oct 2021 • Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass
However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.
Ranked #1 on Spoken Command Recognition on Speech Command v2
no code implementations • 20 Oct 2021 • Ankur Bapna, Yu-An Chung, Nan Wu, Anmol Gulati, Ye Jia, Jonathan H. Clark, Melvin Johnson, Jason Riesa, Alexis Conneau, Yu Zhang
We build a single encoder with the BERT objective on unlabeled text together with the w2v-BERT objective on unlabeled speech.
no code implementations • arXiv 2022 • Peng-Jen Chen, Kevin Tran, Yilin Yang, Jingfei Du, Justine Kao, Yu-An Chung, Paden Tomasello, Paul-Ambroise Duquenne, Holger Schwenk, Hongyu Gong, Hirofumi Inaguma, Sravya Popuri, Changhan Wang, Juan Pino, Wei-Ning Hsu, Ann Lee
We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release.
1 code implementation • 15 Dec 2022 • Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino
We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization.
2 code implementations • 22 Aug 2023 • Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Cora Meglioli, David Dale, Ning Dong, Paul-Ambroise Duquenne, Hady Elsahar, Hongyu Gong, Kevin Heffernan, John Hoffman, Christopher Klaiber, Pengwei Li, Daniel Licht, Jean Maillard, Alice Rakotoarison, Kaushik Ram Sadagopan, Guillaume Wenzek, Ethan Ye, Bapi Akula, Peng-Jen Chen, Naji El Hachem, Brian Ellis, Gabriel Mejia Gonzalez, Justin Haaheim, Prangthip Hansanti, Russ Howes, Bernie Huang, Min-Jae Hwang, Hirofumi Inaguma, Somya Jain, Elahe Kalbassi, Amanda Kallet, Ilia Kulikov, Janice Lam, Daniel Li, Xutai Ma, Ruslan Mavlyutov, Benjamin Peloquin, Mohamed Ramadan, Abinesh Ramakrishnan, Anna Sun, Kevin Tran, Tuan Tran, Igor Tufanov, Vish Vogeti, Carleigh Wood, Yilin Yang, Bokai Yu, Pierre Andrews, Can Balioglu, Marta R. Costa-jussà, Onur Celebi, Maha Elbayad, Cynthia Gao, Francisco Guzmán, Justine Kao, Ann Lee, Alexandre Mourachko, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang
What does it take to create the Babel Fish, a tool that can help individuals translate speech between any two languages?
Ranked #1 on Speech-to-Speech Translation on CVSS (using extra training data)
Automatic Speech Recognition Speech-to-Speech Translation +3
no code implementations • 14 Sep 2023 • Heng-Jui Chang, Ning Dong, Ruslan Mavlyutov, Sravya Popuri, Yu-An Chung
Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech recognition and translation tasks.
1 code implementation • 8 Dec 2023 • Seamless Communication, Loïc Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek, Yilin Yang, Ethan Ye, Ivan Evtimov, Pierre Fernandez, Cynthia Gao, Prangthip Hansanti, Elahe Kalbassi, Amanda Kallet, Artyom Kozhevnikov, Gabriel Mejia Gonzalez, Robin San Roman, Christophe Touret, Corinne Wong, Carleigh Wood, Bokai Yu, Pierre Andrews, Can Balioglu, Peng-Jen Chen, Marta R. Costa-jussà, Maha Elbayad, Hongyu Gong, Francisco Guzmán, Kevin Heffernan, Somya Jain, Justine Kao, Ann Lee, Xutai Ma, Alex Mourachko, Benjamin Peloquin, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Anna Sun, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang, Mary Williamson
In this work, we introduce a family of models that enable end-to-end expressive and multilingual translations in a streaming fashion.