no code implementations • 12 May 2025 • Zexian Yang, Dian Li, Dayan Wu, Gang Liu, Weiping Wang
Despite significant advancements in multimodal reasoning tasks, existing Large Vision-Language Models (LVLMs) are prone to producing visually ungrounded responses when interpreting associated images.
no code implementations • CVPR 2024 • Zexian Yang, Dayan Wu, Chenming Wu, Zheng Lin, Jingzi Gu, Weiping Wang
Whiteness the impressive capabilities in multimodal understanding of Vision Language Foundation Model CLIP a recent two-stage CLIP-based method employs automated prompt engineering to obtain specific textual labels for classifying pedestrians.
no code implementations • 17 Oct 2022 • Zexian Yang, Dayan Wu, Wanqian Zhang, Bo Li, Weiping Wang
Specifically, new data collected from new cameras may probably contain an unknown proportion of identities seen before.