1 code implementation • CVPR 2025 • Henghui Du, Guangyao Li, Chang Zhou, Chunjie Zhang, Alan Zhao, Di Hu
Furthermore, we also visualize the process of explicit cooperation and surprisingly find that each LoRA head has certain audio-visual understanding ability.
no code implementations • 16 Oct 2024 • Yao Shen, Ziwei Wei, Chunmeng Liu, Shuming Wei, Qi Zhao, Kaiyang Zeng, Guangyao Li
To address these challenges, we propose an Adaptive Prompt Learning with SAM (APL-SAM) framework tailored for few-shot SPM image segmentation.
no code implementations • 5 Aug 2024 • Tongtong Feng, Qing Li, Xin Wang, Mingzi Wang, Guangyao Li, Wenwu Zhu
For image restoration, MCGF incorporates a shared encoder and a lightweight restoration module to help the backbone eliminate weather-specific information.
1 code implementation • 30 Jul 2024 • Guangyao Li, Henghui Du, Di Hu
The Audio Visual Question Answering (AVQA) task aims to answer questions related to various visual objects, sounds, and their interactions in videos.
Audio-visual Question Answering
Audio-Visual Question Answering (AVQA)
+2
no code implementations • 15 Jul 2024 • Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu
In this work, we introduce a novel task called Reference Audio-Visual Segmentation (Ref-AVS), which seeks to segment objects within the visual domain based on expressions containing multimodal cues.
1 code implementation • CVPR 2025 • Ruohao Guo, Xianghua Ying, Yaru Chen, Dantong Niu, Guangyao Li, Liao Qu, Yanyu Qi, Jinxing Zhou, Bowei Xing, Wenzhen Yue, Ji Shi, Qixun Wang, Peiliang Zhang, Buwen Liang
In this paper, we propose a new multi-modal task, termed audio-visual instance segmentation (AVIS), which aims to simultaneously identify, segment and track individual sounding object instances in audible videos.
no code implementations • 11 Oct 2023 • Yaru Chen, Ruohao Guo, Xubo Liu, Peipei Wu, Guangyao Li, Zhenbo Li, Wenwu Wang
Audio-visual video parsing is the task of categorizing a video at the segment level with weak labels, and predicting them as audible or visible events.
1 code implementation • 13 Sep 2023 • Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li
Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio?
1 code implementation • 10 Aug 2023 • Guangyao Li, Wenxuan Hou, Di Hu
Such naturally multi-modal videos are composed of rich and complex dynamic audio-visual components, where most of which could be unrelated to the given questions, or even play as interference in answering the content of interest.
Ranked #2 on
Audio-Visual Question Answering (AVQA)
on AVQA
Audio-visual Question Answering
Audio-Visual Question Answering (AVQA)
+2
1 code implementation • 29 May 2023 • Guangyao Li, Yixin Xu, Di Hu
Audio question answering (AQA), acting as a widely used proxy task to explore scene understanding, has got more attention.
1 code implementation • 19 Mar 2023 • Chunmeng Liu, Guangyao Li, Yao Shen, Ruiqi Wang
Given a class, the initial seeds generated based on the transformer may invade regions belonging to other classes.
Weakly supervised Semantic Segmentation
Weakly-Supervised Semantic Segmentation
no code implementations • 4 Feb 2023 • Xiangrong Zhu, Guangyao Li, Wei Hu
To cope with the drift between local optimization and global convergence caused by data heterogeneity, we propose mutual knowledge distillation to transfer local knowledge to global, and absorb global knowledge back.
1 code implementation • 5 Nov 2022 • Xiaobin Tian, Zequn Sun, Guangyao Li, Wei Hu
Towards a critical evaluation of embedding-based entity alignment methods, we construct a new dataset with heterogeneous relations and attributes based on event-centric KGs.
1 code implementation • 21 Aug 2022 • Yang Liu, Zequn Sun, Guangyao Li, Wei Hu
To this end, we propose CoLE, a Co-distillation Learning method for KG Embedding that exploits the complementarity of graph structures and text information.
1 code implementation • CVPR 2022 • Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu
In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.
audio-visual learning
Audio-Visual Question Answering (AVQA)
+4
no code implementations • 16 Mar 2022 • Chunmeng Liu, Enze Xie, Wenjia Wang, Wenhai Wang, Guangyao Li, Ping Luo
Although convolutional neural networks (CNNs) have achieved remarkable progress in weakly supervised semantic segmentation (WSSS), the effective receptive field of CNN is insufficient to capture global context information, leading to sub-optimal results.
1 code implementation • 2 Aug 2021 • Konrad Heidler, Lichao Mou, Di Hu, Pu Jin, Guangyao Li, Chuang Gan, Ji-Rong Wen, Xiao Xiang Zhu
By fine-tuning the models on a number of commonly used remote sensing datasets, we show that our approach outperforms existing pre-training strategies for remote sensing imagery.
Ranked #2 on
Cross-Modal Retrieval
on SoundingEarth
1 code implementation • 9 Sep 2020 • Xinze Lyu, Guangyao Li, Jiacheng Huang, Wei Hu
However, existing work incorporated with KGs cannot capture the explicit long-range semantics between users and items meanwhile consider various connectivity between items.
no code implementations • 1 Dec 2018 • Qingguo Xiao, Guangyao Li, Qiaochuan Chen
Recent advances in deep learning have shown exciting promise in filling large holes and lead to another orientation for image inpainting.
2 code implementations • 21 Nov 2018 • Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li
We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.
Ranked #2 on
Scene Text Detection
on ICDAR 2013
no code implementations • 11 Apr 2018 • Qingguo Xiao, Guangyao Li, Li Xie, Qiaochuan Chen
We propose a novel framework and an effective data augmentation method for deep learning in this paper.