Search Results for author: Wenxiao Cai

Found 8 papers, 4 papers with code

STI-Bench: Are MLLMs Ready for Precise Spatial-Temporal World Understanding?

no code implementations31 Mar 2025 Yun Li, Yiming Zhang, Tao Lin, Xiangrui Liu, Wenxiao Cai, Zheng Liu, Bo Zhao

The use of Multimodal Large Language Models (MLLMs) as an end-to-end solution for Embodied AI and Autonomous Driving has become a prevailing trend.

Autonomous Driving

Precise GPS-Denied UAV Self-Positioning via Context-Enhanced Cross-View Geo-Localization

no code implementations17 Feb 2025 Yuanze Xu, Ming Dai, Wenxiao Cai, Wankou Yang

Image retrieval has been employed as a robust complementary technique to address the challenge of Unmanned Aerial Vehicles (UAVs) self-positioning.

geo-localization Image Retrieval +1

OscNet: Machine Learning on CMOS Oscillator Networks

no code implementations11 Feb 2025 Wenxiao Cai, Thomas H. Lee

In this context, we propose a new and energy efficient machine learning framework implemented on CMOS Oscillator Networks (OscNet).

Uncertainty Quantification in Stereo Matching

no code implementations24 Dec 2024 Wenxiao Cai, Dongting Hu, Ruoyan Yin, Jiankang Deng, Huan Fu, Wankou Yang, Mingming Gong

Stereo matching plays a crucial role in various applications, where understanding uncertainty can enhance both safety and reliability.

Decision Making Disentanglement +2

SpatialBot: Precise Spatial Understanding with Vision Language Models

1 code implementation19 Jun 2024 Wenxiao Cai, Iaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao

Vision Language Models (VLMs) have achieved impressive performance in 2D image understanding, however they are still struggling with spatial understanding which is the foundation of Embodied AI.

Spatial Reasoning

Knowledge NeRF: Few-shot Novel View Synthesis for Dynamic Articulated Objects

1 code implementation31 Mar 2024 Wenxiao Cai, Xinyue Lei, Xinyu He, Junming Leo Chen, Yangang Wang

To clearly reconstruct dynamic scenes, we propose a new framework by considering two frames at a time. We pretrain a NeRF model for an articulated object. When articulated objects moves, Knowledge NeRF learns to generate novel views at the new state by incorporating past knowledge in the pretrained NeRF model with minimal observations in the present state.

NeRF Novel View Synthesis

Object-level Geometric Structure Preserving for Natural Image Stitching

1 code implementation20 Feb 2024 Wenxiao Cai, Wankou Yang

The topic of stitching images with globally natural structures holds paramount significance, with two main goals: pixel-level alignment and distortion prevention.

Image Stitching Object +1

VDD: Varied Drone Dataset for Semantic Segmentation

1 code implementation23 May 2023 Wenxiao Cai, Ke Jin, Jinyan Hou, Cong Guo, Letian Wu, Wankou Yang

Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential semantic details to understand scenes on the ground.

Image Segmentation Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.