1 code implementation • 21 May 2023 • Yuanyuan Jiang, Jianqin Yin
Recent works rely on elaborate target-agnostic parsing of audio-visual scenes for spatial grounding while mistreating audio and video as separate entities for temporal grounding.
Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +3
1 code implementation • 11 Oct 2022 • Yuanyuan Jiang, Jianqin Yin, Yonghao Dang
In contrast to existing methods, we propose a novel video-level semantic consistency guidance network for the AVE localization task.
no code implementations • 16 Nov 2021 • Yuanyuan Jiang, Rui Ding, Tianchi Qiao, Yunan Zhu, Shi Han, Dongmei Zhang
Predictive analytics is human involved, thus the machine learning model is preferred to be interpretable.
1 code implementation • NeurIPS 2021 • Haoyue Dai, Rui Ding, Yuanyuan Jiang, Shi Han, Dongmei Zhang
Starting from seeing that SCL is not better than random guessing if the learning target is non-identifiable a priori, we propose a two-phase paradigm for SCL by explicitly considering structure identifiability.