no code implementations • 11 Oct 2023 • Yaru Chen, Ruohao Guo, Xubo Liu, Peipei Wu, Guangyao Li, Zhenbo Li, Wenwu Wang
Audio-visual video parsing is the task of categorizing a video at the segment level with weak labels, and predicting them as audible or visible events.
1 code implementation • 13 Sep 2023 • Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li
Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio?
1 code implementation • 10 Aug 2023 • Guangyao Li, Wenxuan Hou, Di Hu
Such naturally multi-modal videos are composed of rich and complex dynamic audio-visual components, where most of which could be unrelated to the given questions, or even play as interference in answering the content of interest.
Ranked #2 on Audio-Visual Question Answering (AVQA) on AVQA
Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +2
1 code implementation • 29 May 2023 • Guangyao Li, Yixin Xu, Di Hu
Audio question answering (AQA), acting as a widely used proxy task to explore scene understanding, has got more attention.
1 code implementation • 19 Mar 2023 • Chunmeng Liu, Guangyao Li, Yao Shen, Ruiqi Wang
Given a class, the initial seeds generated based on the transformer may invade regions belonging to other classes.
Weakly supervised Semantic Segmentation Weakly-Supervised Semantic Segmentation
no code implementations • 4 Feb 2023 • Xiangrong Zhu, Guangyao Li, Wei Hu
To cope with the drift between local optimization and global convergence caused by data heterogeneity, we propose mutual knowledge distillation to transfer local knowledge to global, and absorb global knowledge back.
1 code implementation • 5 Nov 2022 • Xiaobin Tian, Zequn Sun, Guangyao Li, Wei Hu
Towards a critical evaluation of embedding-based entity alignment methods, we construct a new dataset with heterogeneous relations and attributes based on event-centric KGs.
1 code implementation • 21 Aug 2022 • Yang Liu, Zequn Sun, Guangyao Li, Wei Hu
To this end, we propose CoLE, a Co-distillation Learning method for KG Embedding that exploits the complementarity of graph structures and text information.
1 code implementation • CVPR 2022 • Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu
In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.
Ranked #5 on Audio-visual Question Answering on MUSIC-AVQA
no code implementations • 16 Mar 2022 • Chunmeng Liu, Enze Xie, Wenjia Wang, Wenhai Wang, Guangyao Li, Ping Luo
Although convolutional neural networks (CNNs) have achieved remarkable progress in weakly supervised semantic segmentation (WSSS), the effective receptive field of CNN is insufficient to capture global context information, leading to sub-optimal results.
1 code implementation • 2 Aug 2021 • Konrad Heidler, Lichao Mou, Di Hu, Pu Jin, Guangyao Li, Chuang Gan, Ji-Rong Wen, Xiao Xiang Zhu
By fine-tuning the models on a number of commonly used remote sensing datasets, we show that our approach outperforms existing pre-training strategies for remote sensing imagery.
Ranked #2 on Cross-Modal Retrieval on SoundingEarth
1 code implementation • 9 Sep 2020 • Xinze Lyu, Guangyao Li, Jiacheng Huang, Wei Hu
However, existing work incorporated with KGs cannot capture the explicit long-range semantics between users and items meanwhile consider various connectivity between items.
no code implementations • 1 Dec 2018 • Qingguo Xiao, Guangyao Li, Qiaochuan Chen
Recent advances in deep learning have shown exciting promise in filling large holes and lead to another orientation for image inpainting.
2 code implementations • 21 Nov 2018 • Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.
Ranked #2 on Scene Text Detection on ICDAR 2013
no code implementations • 11 Apr 2018 • Qingguo Xiao, Guangyao Li, Li Xie, Qiaochuan Chen
We propose a novel framework and an effective data augmentation method for deep learning in this paper.