no code implementations • 17 Sep 2024 • Yao-Fei Cheng, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Wen Shen Teo, Siddhant Arora, Shinji Watanabe
However, expanding language pairs from an existing instruction-tuned ST system is costly due to the necessity of re-training on a combination of new and previous datasets.
no code implementations • 13 Sep 2024 • Yao-Fei Cheng, Li-Wei Chen, Hung-Shin Lee, Hsin-Min Wang
This study investigates the efficacy of data augmentation techniques for low-resource automatic speech recognition (ASR), focusing on two endangered Austronesian languages, Amis and Seediq.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 27 Sep 2023 • Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling.
1 code implementation • 27 Oct 2022 • Fan-Lin Wang, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation network (CasNet), a deep learning framework for end-to-end time-domain speech separation.
2 code implementations • 27 Oct 2022 • Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
The lack of clean speech is a practical challenge to the development of speech enhancement systems, which means that there is an inevitable mismatch between their training criterion and evaluation metric.
no code implementations • 25 Mar 2022 • Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang
For application to robust speech recognition, we further extend c-DcAE to hierarchical and parallel structures, resulting in hc-DcAE and pc-DcAE.
1 code implementation • 25 Mar 2022 • Hung-Shin Lee, Pin-Yuan Chen, Yao-Fei Cheng, Yu Tsao, Hsin-Min Wang
In this paper, a noise-aware training framework based on two cascaded neural structures is proposed to jointly optimize speech enhancement and speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 1 May 2021 • Yao-Fei Cheng, Hung-Shin Lee, Hsin-Min Wang
In this study, we survey methods to improve ST performance without using source transcription, and propose a learning framework that utilizes a language-independent universal phone recognizer.