no code implementations • 8 Oct 2024 • Ya Jiang, Hongbo Lan, Jun Du, Qing Wang, Shutong Niu
In the two-person conversation scenario with one wearing smart glasses, transcribing and displaying the speaker's content in real-time is an intriguing application, providing a priori information for subsequent tasks such as translation and comprehension.
no code implementations • 21 Jun 2024 • Ya Jiang, Qing Wang, Jun Du, Maocheng Hu, Pengfei Hu, Zeyan Liu, Shi Cheng, Zhaoxu Nian, Yuxuan Dong, Mingqi Cai, Xin Fang, Chin-Hui Lee
Evaluation results on the Detection and Classification of Acoustic Scenes and Events (DCASE) 2023 Challenge data set demonstrate significant improvements in SELD performances.
no code implementations • 26 Feb 2024 • Massieh Kordi Boroujeny, Ya Jiang, Kai Zeng, Brian Mark
Methods for watermarking large language models have been proposed that distinguish AI-generated text from human-generated text by slightly altering the model output distribution, but they also distort the quality of the text, exposing the watermark to adversarial detection.
no code implementations • 11 Sep 2023 • Haotian Wang, Yuxuan Xi, Hang Chen, Jun Du, Yan Song, Qing Wang, Hengshun Zhou, Chenxi Wang, Jiefeng Ma, Pengfei Hu, Ya Jiang, Shi Cheng, Jie Zhang, Yuzhe Weng
Three different structures based on attention-guided feature gathering (AFG) are designed for deep feature fusion.
no code implementations • 26 Oct 2022 • Qing Wang, Hang Chen, Ya Jiang, Zhe Wang, Yuyang Wang, Jun Du, Chin-Hui Lee
In this paper, we propose a deep learning based multi-speaker direction of arrival (DOA) estimation with audio and visual signals by using permutation-free loss function.