no code implementations • 23 Jun 2024 • Jinzheng Zhao, Xinyuan Qian, Yong Xu, Haohe Liu, Yin Cao, Davide Berghi, Wenwu Wang
Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA).
Room Impulse Response (RIR) Sound Event Localization and Detection +1
no code implementations • 17 Jun 2024 • Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li
In this paper, we conduct comprehensive experiments to explore the length generalization problem in speech enhancement with Transformer.
1 code implementation • 21 May 2024 • Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps
Moreover, experiments demonstrate the effectiveness of BiMamba as an alternative to the self-attention module in Transformer and its derivates, particularly for the semantic-aware task.
no code implementations • 29 Apr 2024 • Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li
To this end, we propose a novel reverse selective auditory attention mechanism, which can suppress interference speakers and non-speech signals to avoid incorrect speaker extraction.
no code implementations • 1 Apr 2024 • Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang, Haizhou Li
Audio-visual active speaker detection (AV-ASD) aims to identify which visible face is speaking in a scene with one or more persons.
Active Speaker Detection Audio-Visual Active Speaker Detection +2
no code implementations • 16 Oct 2023 • Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li
The prevailing noise-resistant and reverberation-resistant localization algorithms primarily emphasize separating and providing directional output for each speaker in multi-speaker scenarios, without association with the identity of speakers.
no code implementations • 24 May 2023 • Zhi-Hao Lai, Tian-Hao Zhang, Qi Liu, Xinyuan Qian, Li-Fang Wei, Song-Lu Chen, Feng Chen, Xu-Cheng Yin
To address these issues, this paper proposes InterFormer for interactive local and global features fusion to learn a better representation for ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 23 May 2023 • Tian-Hao Zhang, Hai-Bo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian, Xu-Cheng Yin
The experimental results show that ASCD significantly improves the performance by leveraging both the acoustic and semantic information cooperatively.
1 code implementation • CVPR 2023 • Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li
To address the problem, we propose using a lip-reading expert to improve the intelligibility of the generated lip regions by penalizing the incorrect generation results.
no code implementations • 6 Mar 2023 • Kaspar Althoefer, Yonggen Ling, Wanlin Li, Xinyuan Qian, Wang Wei Lee, Peng Qi
The human tactile system is composed of various types of mechanoreceptors, each able to perceive and process distinct information such as force, pressure, texture, etc.
2 code implementations • 24 Jun 2022 • Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang
Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio.
1 code implementation • 31 Mar 2022 • Zexu Pan, Xinyuan Qian, Haizhou Li
Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech.
4 code implementations • 14 Jul 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.
Active Speaker Detection Audio-Visual Active Speaker Detection
1 code implementation • The ActivityNet Large-Scale Activity Recognition Challenge Workshop, CVPR 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li
Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.
Active Speaker Detection Audio-Visual Active Speaker Detection