no code implementations • 20 Aug 2022 • Yake Wei, Di Hu, Yapeng Tian, Xuelong Li
A comprehensive survey that can systematically organize and analyze studies of the audio-visual field is expected.
1 code implementation • CVPR 2022 • Xiaokang Peng, Yake Wei, Andong Deng, Dong Wang, Di Hu
Multimodal learning helps to comprehensively understand the world, by integrating different senses.
1 code implementation • CVPR 2022 • Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu
In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.
Ranked #1 on
Audio-visual Question Answering
on MUSIC-AVQA
1 code implementation • 22 Dec 2021 • Di Hu, Yake Wei, Rui Qian, Weiyao Lin, Ruihua Song, Ji-Rong Wen
To address this problem, we propose a two-stage step-by-step learning framework to localize and recognize sounding objects in complex audiovisual scenarios using only the correspondence between audio and vision.