1 code implementation • 21 Sep 2024 • Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, KaiWei Chang, Jiawei Du, Ke-Han Lu, Alexander H. Liu, Ho-Lam Chung, Yuan-Kuei Wu, Dongchao Yang, Songxiang Liu, Yi-Chiao Wu, Xu Tan, James Glass, Shinji Watanabe, Hung-Yi Lee
Neural audio codec models are becoming increasingly important as they serve as tokenizers for audio, enabling efficient transmission or facilitating speech language modeling.
no code implementations • 23 Aug 2024 • Kai-Wei Chang, Haibin Wu, Yu-Kai Wang, Yuan-Kuei Wu, Hua Shen, Wei-Cheng Tseng, Iu-thing Kang, Shang-Wen Li, Hung-Yi Lee
Motivated by the strengths of prompting, we are the first to explore the potential of prompting speech LMs in the domain of speech processing.
1 code implementation • 20 Feb 2024 • Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-Yi Lee
The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.
no code implementations • 3 Jun 2023 • Haibin Wu, Kai-Wei Chang, Yuan-Kuei Wu, Hung-Yi Lee
In this paper, we present pioneering research that explores the application of prompt tuning to stimulate speech LMs for various generation tasks, within a unified framework called SpeechGen, with around 10M trainable parameters.
no code implementations • 1 Apr 2022 • Wei-Tsung Kao, Yuan-Kuei Wu, Chia-Ping Chen, Zhi-Sheng Chen, Yu-Pao Tsai, Hung-Yi Lee
User-defined keyword spotting is a task to detect new spoken terms defined by users.
no code implementations • 9 Dec 2019 • Chao-I Tuan, Yuan-Kuei Wu, Hung-Yi Lee, Yu Tsao
Our experimental results first confirmed the robustness of our MiTAS on two types of perturbations in mixed audio.