no code implementations • 14 Jan 2025 • Chen Tang, Bo Lv, Zifan Zheng, Bohao Yang, Kun Zhao, Ning Liao, Xiaoxing Wang, Feiyu Xiong, Zhiyu Li, Nayu Liu, Jingchi Jiang
Additionally, this study explores a novel recurrent routing strategy that may inspire further advancements in enhancing the reasoning capabilities of language models.
1 code implementation • 4 Nov 2024 • Yan Li, Weiwei Guo, Xue Yang, Ning Liao, Shaofeng Zhang, Yi Yu, Wenxian Yu, Junchi Yan
In this paper, we put forth a novel formulation of the aerial object detection problem, namely open-vocabulary aerial object detection (OVAD), which can detect objects beyond training categories without costly collecting new labeled data.
1 code implementation • 24 Sep 2024 • Jiacheng Ruan, Wenzhen Yuan, Zehao Lin, Ning Liao, Zhiyu Li, Feiyu Xiong, Ting Liu, Yuzhuo Fu
CamObj-Instruct is collected for fine-tuning the LVLMs with improved instruction-following capabilities, and it includes 11, 363 images and 68, 849 conversations with diverse instructions.
1 code implementation • 20 Nov 2023 • Yan Li, Weiwei Guo, Xue Yang, Ning Liao, Dunyun He, Jiaqi Zhou, Wenxian Yu
The performance of OVD greatly relies on the quality of class-agnostic region proposals and pseudo-labels for novel object categories.
no code implementations • 10 Oct 2023 • Ning Liao, Shaofeng Zhang, Renqiu Xia, Min Cao, Yu Qiao, Junchi Yan
Instead of evaluating the models directly, in this paper, we try to evaluate the Vision-Language Instruction-Tuning (VLIT) datasets.
no code implementations • 9 Mar 2023 • Ning Liao, Xiaopeng Zhang, Min Cao, Junchi Yan
In realistic open-set scenarios where labels of a part of testing data are totally unknown, when vision-language (VL) prompt learning methods encounter inputs related to unknown classes (i. e., not seen during training), they always predict them as one of the training classes.
no code implementations • 9 Mar 2023 • Ning Liao, Bowen Shi, Xiaopeng Zhang, Min Cao, Junchi Yan, Qi Tian
To explore prompt learning on the generative pre-trained visual model, as well as keeping the task consistency, we propose Visual Prompt learning as masked visual Token Modeling (VPTM) to transform the downstream visual classification into the pre-trained masked visual token prediction.