Search Results for author: Weihan Wang

Found 9 papers, 5 papers with code

CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion

no code implementations • 8 Mar 2024 • Wendi Zheng, Jiayan Teng, Zhuoyi Yang, Weihan Wang, Jidong Chen, Xiaotao Gu, Yuxiao Dong, Ming Ding, Jie Tang

Recent advancements in text-to-image generative systems have been largely driven by diffusion models.

Computational Efficiency Super-Resolution +1

Paper
Add Code

CogCoM: Train Large Vision-Language Models Diving into Details through Chain of Manipulations

1 code implementation • 6 Feb 2024 • Ji Qi, Ming Ding, Weihan Wang, Yushi Bai, Qingsong Lv, Wenyi Hong, Bin Xu, Lei Hou, Juanzi Li, Yuxiao Dong, Jie Tang

Vision-Language Models (VLMs) have demonstrated their widespread viability thanks to extensive training in aligning visual instructions to answers.

Visual Reasoning

122

Paper
Code

PlanarNeRF: Online Learning of Planar Primitives with Neural Radiance Fields

no code implementations • 30 Dec 2023 • Zheng Chen, Qingan Yan, Huangying Zhan, Changjiang Cai, Xiangyu Xu, Yuzhong Huang, Weihan Wang, Ziyue Feng, Lantao Liu, Yi Xu

Through extensive experiments, we demonstrate the effectiveness of PlanarNeRF in various scenarios and remarkable improvement over existing works.

3D Plane Detection

Paper
Add Code

CogAgent: A Visual Language Model for GUI Agents

1 code implementation • 14 Dec 2023 • Wenyi Hong, Weihan Wang, Qingsong Lv, Jiazheng Xu, Wenmeng Yu, Junhui Ji, Yan Wang, Zihan Wang, Yuxuan Zhang, Juanzi Li, Bin Xu, Yuxiao Dong, Ming Ding, Jie Tang

People are spending an enormous amount of time on digital devices through graphical user interfaces (GUIs), e. g., computer or smartphone screens.

Ranked #15 on Visual Question Answering on MM-Vet

Language Modelling Visual Question Answering

5,035

Paper
Code

CogVLM: Visual Expert for Pretrained Language Models

1 code implementation • 6 Nov 2023 • Weihan Wang, Qingsong Lv, Wenmeng Yu, Wenyi Hong, Ji Qi, Yan Wang, Junhui Ji, Zhuoyi Yang, Lei Zhao, Xixuan Song, Jiazheng Xu, Bin Xu, Juanzi Li, Yuxiao Dong, Ming Ding, Jie Tang

We introduce CogVLM, a powerful open-source visual language foundation model.

Ranked #4 on Visual Question Answering (VQA) on InfiMM-Eval

Language Modelling Visual Question Answering

5,035

Paper
Code

ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation

no code implementations • ICCV 2023 • Weihan Wang, Zhen Yang, Bin Xu, Juanzi Li, Yankui Sun

Vision-language pre-training (VLP) methods are blossoming recently, and its crucial goal is to jointly learn visual and textual features via a transformer-based architecture, demonstrating promising improvements on a variety of vision-language tasks.

Image-text matching Language Modelling +2

Paper
Add Code

EDI: ESKF-based Disjoint Initialization for Visual-Inertial SLAM Systems

no code implementations • 4 Aug 2023 • Weihan Wang, Jiani Li, Yuhang Ming, Philippos Mordohai

Our method incorporates an Error-state Kalman Filter (ESKF) to estimate gyroscope bias and correct rotation estimates from monocular SLAM, overcoming dependence on pure monocular SLAM for rotation estimation.

Paper
Add Code

Real-Time Dense 3D Mapping of Underwater Environments

1 code implementation • 5 Apr 2023 • Weihan Wang, Bharat Joshi, Nathaniel Burgdorfer, Konstantinos Batsos, Alberto Quattrini Li, Philippos Mordohai, Ioannis Rekleitis

To address this problem, we propose to use SVIn2, a robust VIO method, together with a real-time 3D reconstruction pipeline.

3D Reconstruction

Paper
Code

Learning the Distribution of Errors in Stereo Matching for Joint Disparity and Uncertainty Estimation

1 code implementation • CVPR 2023 • Liyan Chen, Weihan Wang, Philippos Mordohai

We present a new loss function for joint disparity and uncertainty estimation in deep stereo matching.

Multi-Task Learning Stereo Matching

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.