1 code implementation • 9 Feb 2024 • Zequn Yang, Yake Wei, Ce Liang, Di Hu
Moreover, our analysis reveals how the widespread issue, that the model has different preferences for modalities, limits the multi-modal robustness by influencing the essential components and could lead to attacks on the specific modality highly effective.
1 code implementation • 12 Sep 2023 • Yake Wei, Ruoxuan Feng, Zihe Wang, Di Hu
One primary topic of multimodal learning is to jointly incorporate heterogeneous information from different modalities.
no code implementations • 20 Aug 2022 • Yake Wei, Di Hu, Yapeng Tian, Xuelong Li
A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected.
1 code implementation • CVPR 2022 • Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang, Di Hu
Multimodal learning helps to comprehensively understand the world, by integrating different senses.
1 code implementation • CVPR 2022 • Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu
In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.
Ranked #5 on Audio-visual Question Answering on MUSIC-AVQA
1 code implementation • 22 Dec 2021 • Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen
To address this problem, we propose a two-stage step-by-step learning framework to localize and recognize sounding objects in complex audiovisual scenarios using only the correspondence between audio and vision.