no code implementations • 8 Mar 2024 • Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang
Recent advancements in text-to-image generative systems have been largely driven by diffusion models.
1 code implementation • 6 Feb 2024 • Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, Jie Tang
Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers.
no code implementations • 30 Dec 2023 • Zheng Chen, Qingan Yan, Huangying Zhan, Changjiang Cai, Xiangyu Xu, Yuzhong Huang, Weihan Wang, Ziyue Feng, Lantao Liu, Yi Xu
Through extensive experiments, we demonstrate the effectiveness of PlanarNeRF in various scenarios and remarkable improvement over existing works.
1 code implementation • 14 Dec 2023 • Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxuan Zhang, Juanzi Li, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang
People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e. g., computer or smartphone screens.
Ranked #15 on Visual Question Answering on MM-Vet
1 code implementation • 6 Nov 2023 • Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie Tang
We introduce CogVLM, a powerful open-source visual language foundation model.
Ranked #4 on Visual Question Answering (VQA) on InfiMM-Eval
no code implementations • ICCV 2023 • Weihan Wang, Zhen Yang, Bin Xu, Juanzi Li, Yankui Sun
Vision-language pre-training (VLP) methods are blossoming recently, and its crucial goal is to jointly learn visual and textual features via a transformer-based architecture, demonstrating promising improvements on a variety of vision-language tasks.
no code implementations • 4 Aug 2023 • Weihan Wang, Jiani Li, Yuhang Ming, Philippos Mordohai
Our method incorporates an Error-state Kalman Filter (ESKF) to estimate gyroscope bias and correct rotation estimates from monocular SLAM, overcoming dependence on pure monocular SLAM for rotation estimation.
1 code implementation • 5 Apr 2023 • Weihan Wang, Bharat Joshi, Nathaniel Burgdorfer, Konstantinos Batsos, Alberto Quattrini Li, Philippos Mordohai, Ioannis Rekleitis
To address this problem, we propose to use SVIn2, a robust VIO method, together with a real-time 3D reconstruction pipeline.
1 code implementation • CVPR 2023 • Liyan Chen, Weihan Wang, Philippos Mordohai
We present a new loss function for joint disparity and uncertainty estimation in deep stereo matching.