Search Results for author: Guangyao Li

Found 21 papers, 13 papers with code

Crab: A Unified Audio-Visual Scene Understanding Model with Explicit Cooperation

1 code implementation CVPR 2025 Henghui Du, Guangyao Li, Chang Zhou, Chunjie Zhang, Alan Zhao, Di Hu

Furthermore, we also visualize the process of explicit cooperation and surprisingly find that each LoRA head has certain audio-visual understanding ability.

Data Interaction Scene Understanding +2

Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation

no code implementations16 Oct 2024 Yao Shen, Ziwei Wei, Chunmeng Liu, Shuming Wei, Qi Zhao, Kaiyang Zeng, Guangyao Li

To address these challenges, we propose an Adaptive Prompt Learning with SAM (APL-SAM) framework tailored for few-shot SPM image segmentation.

Image Segmentation Prompt Learning +2

Multi-weather Cross-view Geo-localization Using Denoising Diffusion Models

no code implementations5 Aug 2024 Tongtong Feng, Qing Li, Xin Wang, Mingzi Wang, Guangyao Li, Wenwu Zhu

For image restoration, MCGF incorporates a shared encoder and a lightweight restoration module to help the backbone eliminate weather-specific information.

Denoising geo-localization +1

Boosting Audio Visual Question Answering via Key Semantic-Aware Cues

1 code implementation30 Jul 2024 Guangyao Li, Henghui Du, Di Hu

The Audio Visual Question Answering (AVQA) task aims to answer questions related to various visual objects, sounds, and their interactions in videos.

Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +2

Ref-AVS: Refer and Segment Objects in Audio-Visual Scenes

no code implementations15 Jul 2024 Yaoting Wang, Peiwen Sun, Dongzhan Zhou, Guangyao Li, Honggang Zhang, Di Hu

In this work, we introduce a novel task called Reference Audio-Visual Segmentation (Ref-AVS), which seeks to segment objects within the visual domain based on expressions containing multimodal cues.

Segmentation

Audio-Visual Instance Segmentation

1 code implementation CVPR 2025 Ruohao Guo, Xianghua Ying, Yaru Chen, Dantong Niu, Guangyao Li, Liao Qu, Yanyu Qi, Jinxing Zhou, Bowei Xing, Wenzhen Yue, Ji Shi, Qixun Wang, Peiliang Zhang, Buwen Liang

In this paper, we propose a new multi-modal task, termed audio-visual instance segmentation (AVIS), which aims to simultaneously identify, segment and track individual sounding object instances in audible videos.

Instance Segmentation Segmentation +2

CM-PIE: Cross-modal perception for interactive-enhanced audio-visual video parsing

no code implementations11 Oct 2023 Yaru Chen, Ruohao Guo, Xubo Liu, Peipei Wu, Guangyao Li, Zhenbo Li, Wenwu Wang

Audio-visual video parsing is the task of categorizing a video at the segment level with weak labels, and predicting them as audible or visible events.

Prompting Segmentation with Sound Is Generalizable Audio-Visual Source Localizer

1 code implementation13 Sep 2023 Yaoting Wang, Weisong Liu, Guangyao Li, Jian Ding, Di Hu, Xi Li

Never having seen an object and heard its sound simultaneously, can the model still accurately localize its visual position from the input audio?

CoLA Decoder +1

Progressive Spatio-temporal Perception for Audio-Visual Question Answering

1 code implementation10 Aug 2023 Guangyao Li, Wenxuan Hou, Di Hu

Such naturally multi-modal videos are composed of rich and complex dynamic audio-visual components, where most of which could be unrelated to the given questions, or even play as interference in answering the content of interest.

Audio-visual Question Answering Audio-Visual Question Answering (AVQA) +2

Multi-Scale Attention for Audio Question Answering

1 code implementation29 May 2023 Guangyao Li, Yixin Xu, Di Hu

Audio question answering (AQA), acting as a widely used proxy task to explore scene understanding, has got more attention.

Audio Question Answering Question Answering +2

Heterogeneous Federated Knowledge Graph Embedding Learning and Unlearning

no code implementations4 Feb 2023 Xiangrong Zhu, Guangyao Li, Wei Hu

To cope with the drift between local optimization and global convergence caused by data heterogeneity, we propose mutual knowledge distillation to transfer local knowledge to global, and absorb global knowledge back.

Federated Learning Knowledge Distillation +2

EventEA: Benchmarking Entity Alignment for Event-centric Knowledge Graphs

1 code implementation5 Nov 2022 Xiaobin Tian, Zequn Sun, Guangyao Li, Wei Hu

Towards a critical evaluation of embedding-based entity alignment methods, we construct a new dataset with heterogeneous relations and attributes based on event-centric KGs.

Attribute Benchmarking +2

I Know What You Do Not Know: Knowledge Graph Embedding via Co-distillation Learning

1 code implementation21 Aug 2022 Yang Liu, Zequn Sun, Guangyao Li, Wei Hu

To this end, we propose CoLE, a Co-distillation Learning method for KG Embedding that exploits the complementarity of graph structures and text information.

Knowledge Graph Embedding Language Modelling

Learning to Answer Questions in Dynamic Audio-Visual Scenarios

1 code implementation CVPR 2022 Guangyao Li, Yake Wei, Yapeng Tian, Chenliang Xu, Ji-Rong Wen, Di Hu

In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos.

audio-visual learning Audio-Visual Question Answering (AVQA) +4

WegFormer: Transformers for Weakly Supervised Semantic Segmentation

no code implementations16 Mar 2022 Chunmeng Liu, Enze Xie, Wenjia Wang, Wenhai Wang, Guangyao Li, Ping Luo

Although convolutional neural networks (CNNs) have achieved remarkable progress in weakly supervised semantic segmentation (WSSS), the effective receptive field of CNN is insufficient to capture global context information, leading to sub-optimal results.

Segmentation Weakly supervised Semantic Segmentation +1

Self-supervised Audiovisual Representation Learning for Remote Sensing Data

1 code implementation2 Aug 2021 Konrad Heidler, Lichao Mou, Di Hu, Pu Jin, Guangyao Li, Chuang Gan, Ji-Rong Wen, Xiao Xiang Zhu

By fine-tuning the models on a number of commonly used remote sensing datasets, we show that our approach outperforms existing pre-training strategies for remote sensing imagery.

Cross-Modal Retrieval Representation Learning +1

Rule-Guided Graph Neural Networks for Recommender Systems

1 code implementation9 Sep 2020 Xinze Lyu, Guangyao Li, Jiacheng Huang, Wei Hu

However, existing work incorporated with KGs cannot capture the explicit long-range semantics between users and items meanwhile consider various connectivity between items.

Collaborative Filtering Knowledge Graphs +1

Deep Inception Generative Network for Cognitive Image Inpainting

no code implementations1 Dec 2018 Qingguo Xiao, Guangyao Li, Qiaochuan Chen

Recent advances in deep learning have shown exciting promise in filling large holes and lead to another orientation for image inpainting.

Attribute Image Inpainting

Scene Text Detection with Supervised Pyramid Context Network

2 code implementations21 Nov 2018 Enze Xie, Yuhang Zang, Shuai Shao, Gang Yu, Cong Yao, Guangyao Li

We propose a supervised pyramid context network (SPCNET) to precisely locate text regions while suppressing false positives.

Diversity Instance Segmentation +3

Cannot find the paper you are looking for? You can Submit a new open access paper.