1 code implementation • 17 Dec 2024 • Mingjia Shi, Yuhao Zhou, Ruiji Yu, Zekai Li, Zhiyuan Liang, Xuanlei Zhao, Xiaojiang Peng, Tanmay Rajpurohit, Shanmukha Ramakrishna Vedantam, Wangbo Zhao, Kai Wang, Yang You
Re-training the token-reduced model enhances the performance of Mamba, by effectively rebuilding the key knowledge.
1 code implementation • 4 Dec 2024 • Wangbo Zhao, Yizeng Han, Jiasheng Tang, Zhikai Li, Yibing Song, Kai Wang, Zhangyang Wang, Yang You
Vision-language models (VLMs) have shown remarkable success across various multi-modal tasks, yet large VLMs encounter significant efficiency challenges due to processing numerous visual tokens.
Ranked #19 on Visual Question Answering on MM-Vet
1 code implementation • 4 Oct 2024 • Wangbo Zhao, Yizeng Han, Jiasheng Tang, Kai Wang, Yibing Song, Gao Huang, Fan Wang, Yang You
In addition, we design a Spatial-wise Dynamic Token (SDT) strategy to avoid redundant computation at unnecessary spatial locations.
1 code implementation • 6 Aug 2024 • Zekai Li, Ziyao Guo, Wangbo Zhao, Tianle Zhang, Zhi-Qi Cheng, Samir Khaki, Kaipeng Zhang, Ahmad Sajedi, Konstantinos N Plataniotis, Kai Wang, Yang You
To achieve this, existing methods use the agent model to extract information from the target dataset and embed it into the distilled dataset.
1 code implementation • 6 May 2024 • Zheng Zhu, XiaoFeng Wang, Wangbo Zhao, Chen Min, Nianchen Deng, Min Dou, Yuqi Wang, Botian Shi, Kai Wang, Chi Zhang, Yang You, Zhaoxiang Zhang, Dawei Zhao, Liang Xiao, Jian Zhao, Jiwen Lu, Guan Huang
General world models represent a crucial pathway toward achieving Artificial General Intelligence (AGI), serving as the cornerstone for various applications ranging from virtual environments to decision-making systems.
1 code implementation • 18 Mar 2024 • Wangbo Zhao, Jiasheng Tang, Yizeng Han, Yibing Song, Kai Wang, Gao Huang, Fan Wang, Yang You
Existing parameter-efficient fine-tuning (PEFT) methods have achieved significant success on vision transformers (ViTs) adaptation by improving parameter efficiency.
1 code implementation • CVPR 2024 • Ziyang Luo, Nian Liu, Wangbo Zhao, Xuguang Yang, Dingwen Zhang, Deng-Ping Fan, Fahad Khan, Junwei Han
Salient object detection (SOD) and camouflaged object detection (COD) are related yet distinct binary mapping tasks.
1 code implementation • ICCV 2023 • Nian Liu, Kepan Nan, Wangbo Zhao, Yuanwei Liu, Xiwen Yao, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Junwei Han, Fahad Shahbaz Khan
We decompose the query video information into a clip prototype and a memory prototype for capturing local and long-term internal temporal guidance, respectively.
no code implementations • 4 Aug 2023 • Wangbo Zhao, Kepan Nan, Songyang Zhang, Kai Chen, Dahua Lin, Yang You
Based on this scheme, we develop a novel RVOS method that exploits weak annotations effectively.
3 code implementations • 12 Jul 2023 • YuAn Liu, Haodong Duan, Yuanhan Zhang, Bo Li, Songyang Zhang, Wangbo Zhao, Yike Yuan, Jiaqi Wang, Conghui He, Ziwei Liu, Kai Chen, Dahua Lin
In response to these challenges, we propose MMBench, a bilingual benchmark for assessing the multi-modal capabilities of VLMs.
1 code implementation • CVPR 2022 • Wangbo Zhao, Kai Wang, Xiangxiang Chu, Fuzhao Xue, Xinchao Wang, Yang You
Text-based video segmentation aims to segment the target object in a video based on a describing sentence.
Ranked #10 on Referring Expression Segmentation on A2D Sentences
Optical Flow Estimation Referring Expression Segmentation +4
1 code implementation • 2 Oct 2021 • Nian Liu, Wangbo Zhao, Dingwen Zhang, Junwei Han, Ling Shao
On the other hand, instead of processing the twokinds of data separately, we build a novel dual graph modelto guide the focal stack fusion process using all-focus pat-terns.
no code implementations • 8 Jul 2021 • Nian Liu, Long Li, Wangbo Zhao, Junwei Han, Ling Shao
Conventional salient object detection models cannot differentiate the importance of different salient objects.
1 code implementation • CVPR 2021 • Wangbo Zhao, Jing Zhang, Long Li, Nick Barnes, Nian Liu, Junwei Han
Significant performance improvement has been achieved for fully-supervised video salient object detection with the pixel-wise labeled training datasets, which are time-consuming and expensive to obtain.
no code implementations • ICCV 2021 • Nian Liu, Wangbo Zhao, Dingwen Zhang, Junwei Han, Ling Shao
In this paper, we model the information fusion within focal stack via graph networks.