Search Results for author: Cheng-I Jeff Lai

Found 9 papers, 3 papers with code

Audio-Visual Neural Syntax Acquisition

no code implementations • 11 Oct 2023 • Cheng-I Jeff Lai, Freda Shi, Puyuan Peng, Yoon Kim, Kevin Gimpel, Shiyu Chang, Yung-Sung Chuang, Saurabhchand Bhati, David Cox, David Harwath, Yang Zhang, Karen Livescu, James Glass

We study phrase structure induction from visually-grounded speech.

Language Acquisition

Paper
Add Code

Instruction-Following Speech Recognition

no code implementations • 18 Sep 2023 • Cheng-I Jeff Lai, Zhiyun Lu, Liangliang Cao, Ruoming Pang

Conventional end-to-end Automatic Speech Recognition (ASR) models primarily focus on exact transcription tasks, lacking flexibility for nuanced user interactions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Simple and Effective Unsupervised Speech Synthesis

no code implementations • 6 Apr 2022 • Alexander H. Liu, Cheng-I Jeff Lai, Wei-Ning Hsu, Michael Auli, Alexei Baevski, James Glass

We introduce the first unsupervised speech synthesis system based on a simple, yet effective recipe.

speech-recognition Speech Recognition +2

Paper
Add Code

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

1 code implementation • ACL 2022 • Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee

In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB.

Self-Supervised Learning Transfer Learning

2,117

Paper
Code

SSAST: Self-Supervised Audio Spectrogram Transformer

3 code implementations • 19 Oct 2021 • Yuan Gong, Cheng-I Jeff Lai, Yu-An Chung, James Glass

However, pure Transformer models tend to require more training data compared to CNNs, and the success of the AST relies on supervised pretraining that requires a large amount of labeled data and a complex training pipeline, thus limiting the practical usage of AST.

Ranked #1 on Spoken Command Recognition on Speech Command v2

Audio Classification Emotion Recognition +4

1,026

Paper
Code

On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis

no code implementations • 4 Oct 2021 • Cheng-I Jeff Lai, Erica Cooper, Yang Zhang, Shiyu Chang, Kaizhi Qian, Yi-Lun Liao, Yung-Sung Chuang, Alexander H. Liu, Junichi Yamagishi, David Cox, James Glass

Are end-to-end text-to-speech (TTS) models over-parametrized?

Knowledge Distillation Speech Synthesis

Paper
Add Code

PARP: Prune, Adjust and Re-Prune for Self-Supervised Speech Recognition

no code implementations • NeurIPS 2021 • Cheng-I Jeff Lai, Yang Zhang, Alexander H. Liu, Shiyu Chang, Yi-Lun Liao, Yung-Sung Chuang, Kaizhi Qian, Sameer Khurana, David Cox, James Glass

We investigate the existence of sparse subnetworks in pre-trained speech SSL models that achieve even better low-resource ASR results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Cross-Modal Discrete Representation Learning

no code implementations • ACL 2022 • Alexander H. Liu, SouYoung Jin, Cheng-I Jeff Lai, Andrew Rouditchenko, Aude Oliva, James Glass

Recent advances in representation learning have demonstrated an ability to represent information from different modalities such as video, text, and audio in a single high-level embedding vector.

Cross-Modal Retrieval Quantization +4

Paper
Add Code

SUPERB: Speech processing Universal PERformance Benchmark

5 code implementations • 3 May 2021 • Shu-wen Yang, Po-Han Chi, Yung-Sung Chuang, Cheng-I Jeff Lai, Kushal Lakhotia, Yist Y. Lin, Andy T. Liu, Jiatong Shi, Xuankai Chang, Guan-Ting Lin, Tzu-Hsien Huang, Wei-Cheng Tseng, Ko-tik Lee, Da-Rong Liu, Zili Huang, Shuyan Dong, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee

SUPERB is a leaderboard to benchmark the performance of a shared model across a wide range of speech processing tasks with minimal architecture changes and labeled data.

Representation Learning Self-Supervised Learning

2,117

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.