no code implementations • 5 Oct 2023 • Li-Wei Chen, Kai-Chen Cheng, Hung-Shin Lee
This report provides a concise overview of the proposed North system, which aims to achieve automatic word/syllable recognition for Taiwanese Hakka (Sixian).
2 code implementations • 27 Oct 2022 • Li-Wei Chen, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
The lack of clean speech is a practical challenge to the development of speech enhancement systems, which means that there is an inevitable mismatch between their training criterion and evaluation metric.
1 code implementation • 27 Oct 2022 • Fan-Lin Wang, Yao-Fei Cheng, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
In this study, inheriting the use of our previously constructed TAT-2mix corpus, we address the channel mismatch problem by proposing a channel-aware audio separation network (CasNet), a deep learning framework for end-to-end time-domain speech separation.
no code implementations • 1 Apr 2022 • Chiang-Lin Tai, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
Children speech recognition is indispensable but challenging due to the diversity of children's speech.
1 code implementation • 30 Mar 2022 • Fan-Lin Wang, Hung-Shin Lee, Yu Tsao, Hsin-Min Wang
However, domain mismatch between training/test situations due to factors, such as speaker, content, channel, and environment, remains a severe problem for speech separation.
no code implementations • 30 Mar 2022 • Yu-Huai Peng, Hung-Shin Lee, Pin-Tuan Huang, Hsin-Min Wang
In traditional speaker diarization systems, a well-trained speaker model is a key component to extract representations from consecutive and partially overlapping segments in a long speech session.
no code implementations • 28 Mar 2022 • Hung-Shin Lee, Yu Tsao, Shyh-Kang Jeng, Hsin-Min Wang
Phonotactic constraints can be employed to distinguish languages by representing a speech utterance as a multinomial distribution or phone events.
no code implementations • 25 Mar 2022 • Hung-Shin Lee, Pin-Tuan Huang, Yao-Fei Cheng, Hsin-Min Wang
For application to robust speech recognition, we further extend c-DcAE to hierarchical and parallel structures, resulting in hc-DcAE and pc-DcAE.
1 code implementation • 25 Mar 2022 • Hung-Shin Lee, Pin-Yuan Chen, Yao-Fei Cheng, Yu Tsao, Hsin-Min Wang
In this paper, a noise-aware training framework based on two cascaded neural structures is proposed to jointly optimize speech enhancement and speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 14 Jun 2021 • Fan-Lin Wang, Yu-Huai Peng, Hung-Shin Lee, Hsin-Min Wang
DPFN is composed of two parts: the speaker module and the separation module.
no code implementations • 10 Jun 2021 • Yi-Chiao Wu, Cheng-Hung Hu, Hung-Shin Lee, Yu-Huai Peng, Wen-Chin Huang, Yu Tsao, Hsin-Min Wang, Tomoki Toda
Nowadays, neural vocoders can generate very high-fidelity speech when a bunch of training data is available.
1 code implementation • 1 May 2021 • Yao-Fei Cheng, Hung-Shin Lee, Hsin-Min Wang
In this study, we survey methods to improve ST performance without using source transcription, and propose a learning framework that utilizes a language-independent universal phone recognizer.
no code implementations • 6 Oct 2020 • Yu-Huai Peng, Cheng-Hung Hu, Alexander Kang, Hung-Shin Lee, Pin-Yuan Chen, Yu Tsao, Hsin-Min Wang
This paper describes the Academia Sinica systems for the two tasks of Voice Conversion Challenge 2020, namely voice conversion within the same language (Task 1) and cross-lingual voice conversion (Task 2).