no code implementations • 26 Sep 2024 • Wen-Yuan Ting, Wenze Ren, Rong Chao, Hsin-Yi Lin, Yu Tsao, Fan-Gang Zeng
In particular, models like SEMamba have demonstrated the effectiveness of the Mamba architecture in single-channel speech enhancement.
no code implementations • 22 Sep 2024 • Wenze Ren, Kuo-Hsuan Hung, Rong Chao, YouJin Li, Hsin-Min Wang, Yu Tsao
This paper addresses the prevalent issue of incorrect speech output in audio-visual speech enhancement (AVSE) systems, which is often caused by poor video quality and mismatched training and test data.
no code implementations • 16 Sep 2024 • Wenze Ren, Haibin Wu, Yi-Cheng Lin, Xuanjun Chen, Rong Chao, Kuo-Hsuan Hung, You-Jin Li, Wen-Yuan Ting, Hsin-Min Wang, Yu Tsao
In multichannel speech enhancement, effectively capturing spatial and spectral information across different microphones is crucial for noise reduction.
no code implementations • 22 Jul 2024 • Wenze Ren, Yi-Cheng Lin, Huang-Cheng Chou, Haibin Wu, Yi-Chiao Wu, Chi-Chun Lee, Hung-Yi Lee, Yu Tsao
The neural codec model reduces speech data transmission delay and serves as the foundational tokenizer for speech language models (speech LMs).
no code implementations • 20 Sep 2023 • Shafique Ahmed, Chia-Wei Chen, Wenze Ren, Chin-Jou Li, Ernie Chu, Jun-Cheng Chen, Amir Hussain, Hsin-Min Wang, Yu Tsao, Jen-Cheng Hou
Recent studies have increasingly acknowledged the advantages of incorporating visual data into speech enhancement (SE) systems.