1 code implementation • 20 Dec 2023 • Zhangbin Li, Dan Guo, Jinxing Zhou, Jing Zhang, Meng Wang
These selected pairs are constrained to have larger similarity values than the mismatched pairs.
Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +4
1 code implementation • CVPR 2023 • Xuyang Shen, Dong Li, Jinxing Zhou, Zhen Qin, Bowen He, Xiaodong Han, Aixuan Li, Yuchao Dai, Lingpeng Kong, Meng Wang, Yu Qiao, Yiran Zhong
We explore a new task for audio-visual-language modeling called fine-grained audible video description (FAVD).
no code implementations • 4 Mar 2023 • Jinxing Zhou, Dan Guo, Yiran Zhong, Meng Wang
We perform extensive experiments on the LLP dataset and demonstrate that our method can generate high-quality segment-level pseudo labels with the help of our newly proposed loss and the label denoising strategy.
1 code implementation • 30 Jan 2023 • Jinxing Zhou, Xuyang Shen, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong
To deal with these problems, we propose a new baseline method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
1 code implementation • 18 Nov 2022 • Jinxing Zhou, Dan Guo, Meng Wang
Visual and audio signals often coexist in natural environments, forming audio-visual events (AVEs).
1 code implementation • 11 Jul 2022 • Jinxing Zhou, Jianyuan Wang, Jiayi Zhang, Weixuan Sun, Jing Zhang, Stan Birchfield, Dan Guo, Lingpeng Kong, Meng Wang, Yiran Zhong
To deal with the AVS problem, we propose a novel method that uses a temporal pixel-wise audio-visual interaction module to inject audio semantics as guidance for the visual segmentation process.
2 code implementations • CVPR 2021 • Jinxing Zhou, Liang Zheng, Yiran Zhong, Shijie Hao, Meng Wang
To encourage the network to extract high correlated features for positive samples, a new audio-visual pair similarity loss is proposed.