1 code implementation • 20 Feb 2024 • Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-Yi Lee
The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.
no code implementations • 20 Feb 2024 • Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-Yi Lee
Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency.
no code implementations • 16 Jan 2024 • Alexander H. Liu, Sung-Lin Yeh, James Glass
We use linear probes to estimate the mutual information between the target information and learned representations, showing another insight into the accessibility to the target information from speech representations.
no code implementations • 25 Oct 2023 • Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu
Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data.
1 code implementation • 25 Sep 2023 • Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass
Humans are surrounded by audio signals that include both speech and non-speech sounds.
1 code implementation • 18 May 2023 • Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass
On the other hand, modern large language models (LLMs) exhibit emerging reasoning ability but they lack audio perception capabilities.
Ranked #3 on Music Question Answering on MusicQA (using extra training data)
1 code implementation • 18 May 2023 • Heng-Jui Chang, Alexander H. Liu, James Glass
Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging.
1 code implementation • NeurIPS 2023 • Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass
In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering.
1 code implementation • 2 Oct 2022 • Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass
In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.
Ranked #1 on Audio Tagging on AudioSet (using extra training data)
1 code implementation • 29 Jul 2022 • Yuan Gong, Alexander H. Liu, Andrew Rouditchenko, James Glass
Conventional audio-visual models have independent audio and video branches.
Ranked #2 on Multi-modal Classification on AudioSet (using extra training data)
no code implementations • 6 Apr 2022 • Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass
We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe.
1 code implementation • 5 Apr 2022 • Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski
Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 4 Oct 2021 • Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass
Are end-to-end text-to-speech (TTS) models over-parametrized?
no code implementations • NeurIPS 2021 • Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass
We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • ACL 2022 • Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass
Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.
1 code implementation • 1 Nov 2020 • Alexander H. Liu, Yu-An Chung, James Glass
Self-supervised speech representations have been shown to be effective in a variety of speech applications.
no code implementations • ACL 2020 • Shun-Po Chuang, Tzu-Wei Sung, Alexander H. Liu, Hung-Yi Lee
Speech translation (ST) aims to learn transformations from speech in the source language to the text in the target language.
no code implementations • 16 May 2020 • Tao Tu, Yuan-Jui Chen, Alexander H. Liu, Hung-Yi Lee
The experiment results demonstrate that with only an hour of paired speech data, no matter the paired data is from multiple speakers or a single speaker, the proposed model can generate intelligible speech in different voices.
no code implementations • 5 May 2020 • Heng-Jui Chang, Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee
Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data.
no code implementations • 28 Oct 2019 • Alexander H. Liu, Tao Tu, Hung-Yi Lee, Lin-shan Lee
In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances.
1 code implementation • 28 Oct 2019 • Alexander H. Liu, Tzu-Wei Sung, Shun-Po Chuang, Hung-Yi Lee, Lin-shan Lee
This allows the decoder to consider the semantic consistency during decoding by absorbing the information carried by the transformed decoder feature, which is learned to be close to the target word embedding.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 2 Nov 2018 • Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee
In this paper we proposed a novel Adversarial Training (AT) approach for end-to-end speech recognition using a Criticizing Language Model (CLM).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • NeurIPS 2018 • Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, Yu-Chiang Frank Wang
We present a novel and unified deep learning framework which is capable of learning domain-invariant representation from data across multiple domains.