Search Results for author: Yunzhi Zhuge

Found 17 papers, 15 papers with code

Learning Universal Features for Generalizable Image Forgery Localization

1 code implementation10 Apr 2025 Hengrun Zhao, Yunzhi Zhuge, Yifan Wang, Lijun Wang, Huchuan Lu, Yu Zeng

In recent years, advanced image editing and generation methods have rapidly evolved, making detecting and locating forged image content increasingly challenging.

Image Forgery Detection

Streaming Video Understanding and Multi-round Interaction with Memory-enhanced Knowledge

1 code implementation23 Jan 2025 Haomiao Xiong, Zongxin Yang, Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Jiawen Zhu, Huchuan Lu

Recent advances in Large Language Models (LLMs) have enabled the development of Video-LLMs, advancing multimodal learning by bridging video data with language tasks.

Scheduling Video Understanding

The Devil is in Temporal Token: High Quality Video Reasoning Segmentation

1 code implementation15 Jan 2025 Sitong Gong, Yunzhi Zhuge, Lu Zhang, Zongxin Yang, Pingping Zhang, Huchuan Lu

Existing methods for Video Reasoning Segmentation rely heavily on a single special token to represent the object in the keyframe or the entire video, inadequately capturing spatial complexity and inter-frame motion.

Reasoning Segmentation Referring Expression Segmentation +2

3UR-LLM: An End-to-End Multimodal Large Language Model for 3D Scene Understanding

1 code implementation14 Jan 2025 Haomiao Xiong, Yunzhi Zhuge, Jiawen Zhu, Lu Zhang, Huchuan Lu

Multi-modal Large Language Models (MLLMs) exhibit impressive capabilities in 2D tasks, yet encounter challenges in discerning the spatial positions, interrelations, and causal logic in scenes when transitioning from 2D to 3D representations.

Language Modeling Language Modelling +3

Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation

1 code implementation14 Jan 2025 Yunzhi Zhuge, Hongyu Gu, Lu Zhang, Jinqing Qi, Huchuan Lu

In this paper, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues.

Object object-detection +6

AVS-Mamba: Exploring Temporal and Multi-modal Mamba for Audio-Visual Segmentation

1 code implementation14 Jan 2025 Sitong Gong, Yunzhi Zhuge, Lu Zhang, Yifan Wang, Pingping Zhang, Lijun Wang, Huchuan Lu

To perform multi-modal fusion, we propose the Modality Aggregation Decoder, leveraging the Vision-to-Audio Fusion Block to integrate visual features into audio features across both frame and temporal levels.

Mamba Video Understanding

Towards Open-Vocabulary Remote Sensing Image Semantic Segmentation

1 code implementation27 Dec 2024 Chengyang Ye, Yunzhi Zhuge, Pingping Zhang

In this work, we introduce Open-Vocabulary Remote Sensing Image Semantic Segmentation (OVRSISS), which aims to segment arbitrary semantic classes in remote sensing images.

Image Segmentation Semantic Segmentation

DreamMix: Decoupling Object Attributes for Enhanced Editability in Customized Image Inpainting

1 code implementation26 Nov 2024 Yicheng Yang, Pengxiang Li, Lu Zhang, Liqian Ma, Ping Hu, Siyu Du, Yunzhi Zhuge, Xu Jia, Huchuan Lu

Extensive experiments demonstrate that DreamMix effectively balances identity preservation and attribute editability across various application scenarios, including object insertion, attribute editing, and small object inpainting.

Attribute Diversity +2

LLMs Can Evolve Continually on Modality for X-Modal Reasoning

1 code implementation26 Oct 2024 Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen

Multimodal Large Language Models (MLLMs) have gained significant attention due to their impressive capabilities in multimodal understanding.

Continual Learning multimodal interaction

SHERL: Synthesizing High Accuracy and Efficient Memory for Resource-Limited Transfer Learning

1 code implementation10 Jul 2024 Haiwen Diao, Bo Wan, Xu Jia, Yunzhi Zhuge, Ying Zhang, Huchuan Lu, Long Chen

Parameter-efficient transfer learning (PETL) has emerged as a flourishing research field for adapting large pre-trained models to downstream tasks, greatly reducing trainable parameters while grappling with memory challenges during fine-tuning.

Transfer Learning

Boosting Continual Learning of Vision-Language Models via Mixture-of-Experts Adapters

2 code implementations CVPR 2024 Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, You He

Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset.

Continual Learning Incremental Learning +3

StableIdentity: Inserting Anybody into Anywhere at First Sight

1 code implementation29 Jan 2024 Qinghe Wang, Xu Jia, Xiaomin Li, Taiqing Li, Liqian Ma, Yunzhi Zhuge, Huchuan Lu

We believe that the proposed StableIdentity is an important step to unify image, video, and 3D customized generation models.

3D Generation

Multi-source weak supervision for saliency detection

1 code implementation CVPR 2019 Yu Zeng, Yunzhi Zhuge, Huchuan Lu, Lihe Zhang, Mingyang Qian, Yizhou Yu

To this end, we propose a unified framework to train saliency detection models with diverse weak supervision sources.

Caption Generation Saliency Prediction

Boundary-guided Feature Aggregation Network for Salient Object Detection

no code implementations28 Sep 2018 Yunzhi Zhuge, Pingping Zhang, Huchuan Lu

Fully convolutional networks (FCN) has significantly improved the performance of many pixel-labeling tasks, such as semantic segmentation and depth estimation.

Depth Estimation Object +4

Cannot find the paper you are looking for? You can Submit a new open access paper.