1 code implementation • 10 Apr 2025 • Hengrun Zhao, Yunzhi Zhuge, Yifan Wang, Lijun Wang, Huchuan Lu, Yu Zeng
In recent years, advanced image editing and generation methods have rapidly evolved, making detecting and locating forged image content increasingly challenging.
1 code implementation • 23 Jan 2025 • Haomiao Xiong, Zongxin Yang, Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Jiawen Zhu, Huchuan Lu
Recent advances in Large Language Models (LLMs) have enabled the development of Video-LLMs, advancing multimodal learning by bridging video data with language tasks.
1 code implementation • 15 Jan 2025 • Sitong Gong, Yunzhi Zhuge, Lu Zhang, Zongxin Yang, Pingping Zhang, Huchuan Lu
Existing methods for Video Reasoning Segmentation rely heavily on a single special token to represent the object in the keyframe or the entire video, inadequately capturing spatial complexity and inter-frame motion.
Ranked #1 on
Referring Video Object Segmentation
on ReVOS
1 code implementation • 14 Jan 2025 • Haomiao Xiong, Yunzhi Zhuge, Jiawen Zhu, Lu Zhang, Huchuan Lu
Multi-modal Large Language Models (MLLMs) exhibit impressive capabilities in 2D tasks, yet encounter challenges in discerning the spatial positions, interrelations, and causal logic in scenes when transitioning from 2D to 3D representations.
1 code implementation • 14 Jan 2025 • Yunzhi Zhuge, Hongyu Gu, Lu Zhang, Jinqing Qi, Huchuan Lu
In this paper, we address the challenges in unsupervised video object segmentation (UVOS) by proposing an efficient algorithm, termed MTNet, which concurrently exploits motion and temporal cues.
1 code implementation • 14 Jan 2025 • Sitong Gong, Yunzhi Zhuge, Lu Zhang, Yifan Wang, Pingping Zhang, Lijun Wang, Huchuan Lu
To perform multi-modal fusion, we propose the Modality Aggregation Decoder, leveraging the Vision-to-Audio Fusion Block to integrate visual features into audio features across both frame and temporal levels.
1 code implementation • 27 Dec 2024 • Chengyang Ye, Yunzhi Zhuge, Pingping Zhang
In this work, we introduce Open-Vocabulary Remote Sensing Image Semantic Segmentation (OVRSISS), which aims to segment arbitrary semantic classes in remote sensing images.
no code implementations • 29 Nov 2024 • Wenbo Zhang, Lu Zhang, Ping Hu, Liqian Ma, Yunzhi Zhuge, Huchuan Lu
Injecting semantics into 3D Gaussian Splatting (3DGS) has recently garnered significant attention.
1 code implementation • 26 Nov 2024 • Yicheng Yang, Pengxiang Li, Lu Zhang, Liqian Ma, Ping Hu, Siyu Du, Yunzhi Zhuge, Xu Jia, Huchuan Lu
Extensive experiments demonstrate that DreamMix effectively balances identity preservation and attribute editability across various application scenarios, including object insertion, attribute editing, and small object inpainting.
1 code implementation • 26 Oct 2024 • Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen
Multimodal Large Language Models (MLLMs) have gained significant attention due to their impressive capabilities in multimodal understanding.
1 code implementation • 10 Jul 2024 • Haiwen Diao, Bo Wan, Xu Jia, Yunzhi Zhuge, Ying Zhang, Huchuan Lu, Long Chen
Parameter-efficient transfer learning (PETL) has emerged as a flourishing research field for adapting large pre-trained models to downstream tasks, greatly reducing trainable parameters while grappling with memory challenges during fine-tuning.
2 code implementations • CVPR 2024 • Jiazuo Yu, Yunzhi Zhuge, Lu Zhang, Ping Hu, Dong Wang, Huchuan Lu, You He
Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset.
1 code implementation • 29 Jan 2024 • Qinghe Wang, Xu Jia, Xiaomin Li, Taiqing Li, Liqian Ma, Yunzhi Zhuge, Huchuan Lu
We believe that the proposed StableIdentity is an important step to unify image, video, and 3D customized generation models.
1 code implementation • ICCV 2023 • Kaining Ying, Qing Zhong, Weian Mao, Zhenhua Wang, Hao Chen, Lin Yuanbo Wu, Yifan Liu, Chengxiang Fan, Yunzhi Zhuge, Chunhua Shen
The discrimination of instance embeddings plays a vital role in associating instances across time for online video instance segmentation (VIS).
Ranked #3 on
Video Instance Segmentation
on Youtube-VIS 2022 Validation
(using extra training data)
1 code implementation • ICCV 2019 • Yu Zeng, Yunzhi Zhuge, Huchuan Lu, Lihe Zhang
SSNet consists of a segmentation network (SN) and a saliency aggregation module (SAM).
1 code implementation • CVPR 2019 • Yu Zeng, Yunzhi Zhuge, Huchuan Lu, Lihe Zhang, Mingyang Qian, Yizhou Yu
To this end, we propose a unified framework to train saliency detection models with diverse weak supervision sources.
no code implementations • 28 Sep 2018 • Yunzhi Zhuge, Pingping Zhang, Huchuan Lu
Fully convolutional networks (FCN) has significantly improved the performance of many pixel-labeling tasks, such as semantic segmentation and depth estimation.