Search Results for author: Weihan Wang

Found 9 papers, 5 papers with code

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

1 code implementation6 Feb 2024 Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, Jie Tang

Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers.

Visual Reasoning

PlanarNeRF: Online Learning of Planar Primitives with Neural Radiance Fields

no code implementations30 Dec 2023 Zheng Chen, Qingan Yan, Huangying Zhan, Changjiang Cai, Xiangyu Xu, Yuzhong Huang, Weihan Wang, Ziyue Feng, Lantao Liu, Yi Xu

Through extensive experiments, we demonstrate the effectiveness of PlanarNeRF in various scenarios and remarkable improvement over existing works.

3D Plane Detection

CogAgent: A Visual Language Model for GUI Agents

1 code implementation14 Dec 2023 Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxuan Zhang, Juanzi Li, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang

People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e. g., computer or smartphone screens.

Language Modelling Visual Question Answering

ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation

no code implementations ICCV 2023 Weihan Wang, Zhen Yang, Bin Xu, Juanzi Li, Yankui Sun

Vision-language pre-training (VLP) methods are blossoming recently, and its crucial goal is to jointly learn visual and textual features via a transformer-based architecture, demonstrating promising improvements on a variety of vision-language tasks.

Image-text matching Language Modelling +2

EDI: ESKF-based Disjoint Initialization for Visual-Inertial SLAM Systems

no code implementations4 Aug 2023 Weihan Wang, Jiani Li, Yuhang Ming, Philippos Mordohai

Our method incorporates an Error-state Kalman Filter (ESKF) to estimate gyroscope bias and correct rotation estimates from monocular SLAM, overcoming dependence on pure monocular SLAM for rotation estimation.

Real-Time Dense 3D Mapping of Underwater Environments

1 code implementation5 Apr 2023 Weihan Wang, Bharat Joshi, Nathaniel Burgdorfer, Konstantinos Batsos, Alberto Quattrini Li, Philippos Mordohai, Ioannis Rekleitis

To address this problem, we propose to use SVIn2, a robust VIO method, together with a real-time 3D reconstruction pipeline.

3D Reconstruction

Cannot find the paper you are looking for? You can Submit a new open access paper.