no code implementations • 23 Feb 2024 • Junlin Xie, Zhihong Chen, Ruifei Zhang, Xiang Wan, Guanbin Li
In this paper, we conduct a systematic review of LLM-driven multimodal agents, which we refer to as large multimodal agents ( LMAs for short).
1 code implementation • 18 Feb 2024 • Guiming Hardy Chen, Shunian Chen, Ruifei Zhang, Junying Chen, Xiangbo Wu, Zhiyi Zhang, Zhihong Chen, Jianquan Li, Xiang Wan, Benyou Wang
Recent advancements in Large Vision-Language Models (LVLMs) have enabled processing of multimodal inputs in language models but require significant computational resources for deployment, especially in edge devices.
1 code implementation • CVPR 2023 • Zhihong Chen, Ruifei Zhang, Yibing Song, Xiang Wan, Guanbin Li
Therefore, in this paper, we propose a novel benchmark of \underline{S}cene \underline{K}nowledge-guided \underline{V}isual \underline{G}rounding (SK-VG), where the image content and referring expressions are not sufficient to ground the target objects, forcing the models to have a reasoning ability on the long-form scene knowledge.
1 code implementation • 12 Jan 2023 • Ruifei Zhang, Peiwen Lai, Xiang Wan, De-Jun Fan, Feng Gao, Xiao-Jian Wu, Guanbin Li
Automatic and accurate polyp segmentation plays an essential role in early colorectal cancer diagnosis.
1 code implementation • 12 Jan 2023 • Ruifei Zhang, Sishuo Liu, Yizhou Yu, Guanbin Li
Since the two tasks rely on similar feature information, the unlabeled data effectively enhances the representation of the network to the lesion regions and further improves the segmentation performance.
1 code implementation • 12 Jan 2023 • Ruifei Zhang, Guanbin Li, Zhen Li, Shuguang Cui, Dahong Qian, Yizhou Yu
To tackle these issues, we propose an adaptive context selection based encoder-decoder framework which is composed of Local Context Attention (LCA) module, Global Context Module (GCM) and Adaptive Selection Module (ASM).