1 code implementation • 28 Apr 2024 • Zhiqi Ge, Hongzhe Huang, Mingze Zhou, Juncheng Li, Guoming Wang, Siliang Tang, Yueting Zhuang
As for evaluation, we build WorldNet, a multimodal state transition prediction benchmark encompassing varied real-life scenarios.
no code implementations • CVPR 2024 • Xinyi Jiang, Guoming Wang, Junhao Guo, Juncheng Li, Wenqiao Zhang, Rongxing Lu, Siliang Tang
On MM-Vet our method achieves an improvement in MM-Vet scores increasing from 31. 1 to 32. 4.
no code implementations • 21 Nov 2023 • Minghe Gao, Juncheng Li, Hao Fei, Liang Pang, Wei Ji, Guoming Wang, Zheqi Lv, Wenqiao Zhang, Siliang Tang, Yueting Zhuang
Visual programming, a modular and generalizable paradigm, integrates different modules and Python operators to solve various vision-language tasks.
1 code implementation • 4 Oct 2023 • Dong Chen, Kaihang Pan, Guoming Wang, Yueting Zhuang, Siliang Tang
To learn a more compact latent space for the vision anomaly detector, CMLE learns a correlation structure matrix from the language modality, and then the latent space of vision modality will be learned with the guidance of the matrix.