Search Results for author: Xinyuan Qian

Found 14 papers, 6 papers with code

Text-Queried Target Sound Event Localization

no code implementations23 Jun 2024 Jinzheng Zhao, Xinyuan Qian, Yong Xu, Haohe Liu, Yin Cao, Davide Berghi, Wenwu Wang

Sound event localization and detection (SELD) aims to determine the appearance of sound classes, together with their Direction of Arrival (DOA).

Room Impulse Response (RIR) Sound Event Localization and Detection +1

An Exploration of Length Generalization in Transformer-Based Speech Enhancement

no code implementations17 Jun 2024 Qiquan Zhang, Hongxu Zhu, Xinyuan Qian, Eliathamby Ambikairajah, Haizhou Li

In this paper, we conduct comprehensive experiments to explore the length generalization problem in speech enhancement with Transformer.

Position Speech Enhancement

Mamba in Speech: Towards an Alternative to Self-Attention

1 code implementation21 May 2024 Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

Moreover, experiments demonstrate the effectiveness of BiMamba as an alternative to the self-attention module in Transformer and its derivates, particularly for the semantic-aware task.

Speech Enhancement speech-recognition +1

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

no code implementations29 Apr 2024 Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li

To this end, we propose a novel reverse selective auditory attention mechanism, which can suppress interference speakers and non-speech signals to avoid incorrect speaker extraction.

Target Speaker Extraction

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

no code implementations16 Oct 2023 Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li

The prevailing noise-resistant and reverberation-resistant localization algorithms primarily emphasize separating and providing directional output for each speaker in multi-speaker scenarios, without association with the identity of speakers.

Rethinking Speech Recognition with A Multimodal Perspective via Acoustic and Semantic Cooperative Decoding

no code implementations23 May 2023 Tian-Hao Zhang, Hai-Bo Qin, Zhi-Hao Lai, Song-Lu Chen, Qi Liu, Feng Chen, Xinyuan Qian, Xu-Cheng Yin

The experimental results show that ASCD significantly improves the performance by leveraging both the acoustic and semantic information cooperatively.

Decoder speech-recognition +1

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

1 code implementation CVPR 2023 Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li

To address the problem, we propose using a lip-reading expert to improve the intelligibility of the generated lip regions by penalizing the incorrect generation results.

Contrastive Learning Lip Reading +1

A Miniaturised Camera-based Multi-Modal Tactile Sensor

no code implementations6 Mar 2023 Kaspar Althoefer, Yonggen Ling, Wanlin Li, Xinyuan Qian, Wang Wei Lee, Peng Qi

The human tactile system is composed of various types of mechanoreceptors, each able to perceive and process distinct information such as force, pressure, texture, etc.

Iterative Sound Source Localization for Unknown Number of Sources

2 code implementations24 Jun 2022 Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang

Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio.

Sound Source Localization

Speaker Extraction with Co-Speech Gestures Cue

1 code implementation31 Mar 2022 Zexu Pan, Xinyuan Qian, Haizhou Li

Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech.

Speech Separation

Cannot find the paper you are looking for? You can Submit a new open access paper.