1 code implementation • 14 Dec 2023 • Davide Berghi, Peipei Wu, Jinzheng Zhao, Wenwu Wang, Philip J. B. Jackson
Sound event localization and detection (SELD) combines two subtasks: sound event detection (SED) and direction of arrival (DOA) estimation.
no code implementations • 11 Oct 2023 • Yaru Chen, Ruohao Guo, Xubo Liu, Peipei Wu, Guangyao Li, Zhenbo Li, Wenwu Wang
Audio-visual video parsing is the task of categorizing a video at the segment level with weak labels, and predicting them as audible or visible events.
1 code implementation • 17 Jun 2023 • Yi Yuan, Haohe Liu, Xubo Liu, Xiyuan Kang, Peipei Wu, Mark D. Plumbley, Wenwu Wang
We have observed that the feature embedding extracted by the text encoder can significantly affect the performance of the generation model.