no code implementations • 31 Mar 2025 • Yun Li, Yiming Zhang, Tao Lin, Xiangrui Liu, Wenxiao Cai, Zheng Liu, Bo Zhao
The use of Multimodal Large Language Models (MLLMs) as an end-to-end solution for Embodied AI and Autonomous Driving has become a prevailing trend.
no code implementations • 17 Feb 2025 • Yuanze Xu, Ming Dai, Wenxiao Cai, Wankou Yang
Image retrieval has been employed as a robust complementary technique to address the challenge of Unmanned Aerial Vehicles (UAVs) self-positioning.
no code implementations • 11 Feb 2025 • Wenxiao Cai, Thomas H. Lee
In this context, we propose a new and energy efficient machine learning framework implemented on CMOS Oscillator Networks (OscNet).
no code implementations • 24 Dec 2024 • Wenxiao Cai, Dongting Hu, Ruoyan Yin, Jiankang Deng, Huan Fu, Wankou Yang, Mingming Gong
Stereo matching plays a crucial role in various applications, where understanding uncertainty can enhance both safety and reliability.
1 code implementation • 19 Jun 2024 • Wenxiao Cai, Iaroslav Ponomarenko, Jianhao Yuan, Xiaoqi Li, Wankou Yang, Hao Dong, Bo Zhao
Vision Language Models (VLMs) have achieved impressive performance in 2D image understanding, however they are still struggling with spatial understanding which is the foundation of Embodied AI.
Ranked #4 on
Spatial Reasoning
on 6-DoF SpatialBench
1 code implementation • 31 Mar 2024 • Wenxiao Cai, Xinyue Lei, Xinyu He, Junming Leo Chen, Yangang Wang
To clearly reconstruct dynamic scenes, we propose a new framework by considering two frames at a time. We pretrain a NeRF model for an articulated object. When articulated objects moves, Knowledge NeRF learns to generate novel views at the new state by incorporating past knowledge in the pretrained NeRF model with minimal observations in the present state.
1 code implementation • 20 Feb 2024 • Wenxiao Cai, Wankou Yang
The topic of stitching images with globally natural structures holds paramount significance, with two main goals: pixel-level alignment and distortion prevention.
1 code implementation • 23 May 2023 • Wenxiao Cai, Ke Jin, Jinyan Hou, Cong Guo, Letian Wu, Wankou Yang
Semantic segmentation of drone images is critical for various aerial vision tasks as it provides essential semantic details to understand scenes on the ground.
Ranked #1 on
Semantic Segmentation
on VDD