Search Results for author: Alexander H. Liu

Found 23 papers, 11 papers with code

Codec-SUPERB: An In-Depth Analysis of Sound Codec Models

1 code implementation20 Feb 2024 Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-Yi Lee

The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.

Towards audio language modeling - an overview

no code implementations20 Feb 2024 Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-Yi Lee

Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency.

Language Modelling

Revisiting Self-supervised Learning of Speech Representation from a Mutual Information Perspective

no code implementations16 Jan 2024 Alexander H. Liu, Sung-Lin Yeh, James Glass

We use linear probes to estimate the mutual information between the target information and learned representations, showing another insight into the accessibility to the target information from speech representations.

Representation Learning Self-Supervised Learning +2

Generative Pre-training for Speech with Flow Matching

no code implementations25 Oct 2023 Alexander H. Liu, Matt Le, Apoorv Vyas, Bowen Shi, Andros Tjandra, Wei-Ning Hsu

Generative models have gained more and more attention in recent years for their remarkable success in tasks that required estimating and sampling data distribution to generate high-fidelity synthetic data.

Speech Enhancement Speech Synthesis +1

Joint Audio and Speech Understanding

1 code implementation25 Sep 2023 Yuan Gong, Alexander H. Liu, Hongyin Luo, Leonid Karlinsky, James Glass

Humans are surrounded by audio signals that include both speech and non-speech sounds.

Listen, Think, and Understand

2 code implementations18 May 2023 Yuan Gong, Hongyin Luo, Alexander H. Liu, Leonid Karlinsky, James Glass

On the other hand, modern large language models (LLMs) exhibit emerging reasoning ability but they lack audio perception capabilities.

Ranked #3 on Music Question Answering on MusicQA (using extra training data)

Language Modelling Large Language Model +1

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

1 code implementation18 May 2023 Heng-Jui Chang, Alexander H. Liu, James Glass

Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging.

Acoustic Unit Discovery Clustering +3

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

1 code implementation NeurIPS 2023 Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass

In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering.

Clustering Language Modelling +3

Contrastive Audio-Visual Masked Autoencoder

1 code implementation2 Oct 2022 Yuan Gong, Andrew Rouditchenko, Alexander H. Liu, David Harwath, Leonid Karlinsky, Hilde Kuehne, James Glass

In this paper, we first extend the recent Masked Auto-Encoder (MAE) model from a single modality to audio-visual multi-modalities.

 Ranked #1 on Audio Tagging on AudioSet (using extra training data)

Audio Classification Audio Tagging +6

Towards End-to-end Unsupervised Speech Recognition

1 code implementation5 Apr 2022 Alexander H. Liu, Wei-Ning Hsu, Michael Auli, Alexei Baevski

Unsupervised speech recognition has shown great potential to make Automatic Speech Recognition (ASR) systems accessible to every language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cross-Modal Discrete Representation Learning

no code implementations ACL 2022 Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.

Cross-Modal Retrieval Quantization +4

Non-Autoregressive Predictive Coding for Learning Speech Representations from Local Dependencies

1 code implementation1 Nov 2020 Alexander H. Liu, Yu-An Chung, James Glass

Self-supervised speech representations have been shown to be effective in a variety of speech applications.

Representation Learning

Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation

no code implementations16 May 2020 Tao Tu, Yuan-Jui Chen, Alexander H. Liu, Hung-Yi Lee

The experiment results demonstrate that with only an hour of paired speech data, no matter the paired data is from multiple speakers or a single speaker, the proposed model can generate intelligible speech in different voices.

Decoder Speech Synthesis +1

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training

no code implementations5 May 2020 Heng-Jui Chang, Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee

Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data.

speech-recognition Speech Recognition +1

Towards Unsupervised Speech Recognition and Synthesis with Quantized Speech Representation Learning

no code implementations28 Oct 2019 Alexander H. Liu, Tao Tu, Hung-Yi Lee, Lin-shan Lee

In this paper we propose a Sequential Representation Quantization AutoEncoder (SeqRQ-AE) to learn from primarily unpaired audio data and produce sequences of representations very close to phoneme sequences of speech utterances.

Clustering Quantization +4

Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

1 code implementation28 Oct 2019 Alexander H. Liu, Tzu-Wei Sung, Shun-Po Chuang, Hung-Yi Lee, Lin-shan Lee

This allows the decoder to consider the semantic consistency during decoding by absorbing the information carried by the transformed decoder feature, which is learned to be close to the target word embedding.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Adversarial Training of End-to-end Speech Recognition Using a Criticizing Language Model

no code implementations2 Nov 2018 Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee

In this paper we proposed a novel Adversarial Training (AT) approach for end-to-end speech recognition using a Criticizing Language Model (CLM).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Unified Feature Disentangler for Multi-Domain Image Translation and Manipulation

1 code implementation NeurIPS 2018 Alexander H. Liu, Yen-Cheng Liu, Yu-Ying Yeh, Yu-Chiang Frank Wang

We present a novel and unified deep learning framework which is capable of learning domain-invariant representation from data across multiple domains.

Translation Unsupervised Domain Adaptation

Cannot find the paper you are looking for? You can Submit a new open access paper.