2 code implementations • 4 Jan 2025 • Zongxia Li, Xiyang Wu, Hongyang Du, Fuxiao Liu, Huy Nghiem, Guangyao Shi
Multimodal Vision Language Models (VLMs) have emerged as a transformative topic at the intersection of computer vision and natural language processing, enabling machines to perceive and reason about the world through both visual and textual modalities.
no code implementations • 26 Sep 2024 • Ruiqi Xian, Xiyang Wu, Tianrui Guan, Xijun Wang, Boqing Gong, Dinesh Manocha
We introduce SOAR, a novel Self-supervised pretraining algorithm for aerial footage captured by Unmanned Aerial Vehicles (UAVs).
2 code implementations • 16 Jun 2024 • Xiyang Wu, Tianrui Guan, Dianqi Li, Shuaiyi Huang, Xiaoyu Liu, Xijun Wang, Ruiqi Xian, Abhinav Shrivastava, Furong Huang, Jordan Lee Boyd-Graber, Tianyi Zhou, Dinesh Manocha
This motivates the development of AutoHallusion, the first automated benchmark generation approach that employs several key strategies to create a diverse range of hallucination examples.
Ranked #1 on
Visual Question Answering (VQA)
on AutoHallusion
1 code implementation • 4 Apr 2024 • Tianrui Guan, Ruiqi Xian, Xijun Wang, Xiyang Wu, Mohamed Elnoor, Daeun Song, Dinesh Manocha
We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps.
1 code implementation • 15 Feb 2024 • Xiyang Wu, Souradip Chakraborty, Ruiqi Xian, Jing Liang, Tianrui Guan, Fuxiao Liu, Brian M. Sadler, Dinesh Manocha, Amrit Singh Bedi
In this work, we highlight vulnerabilities in robotic systems integrating large language models (LLMs) and vision-language models (VLMs) due to input modality sensitivities.
7 code implementations • CVPR 2024 • Tianrui Guan, Fuxiao Liu, Xiyang Wu, Ruiqi Xian, Zongxia Li, Xiaoyu Liu, Xijun Wang, Lichang Chen, Furong Huang, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou
Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs.
Ranked #1 on
Visual Question Answering (VQA)
on HallusionBench
1 code implementation • 9 Jun 2023 • Xiyang Wu, Rohan Chandra, Tianrui Guan, Amrit Singh Bedi, Dinesh Manocha
Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers' intents solely from their local observations.
1 code implementation • 31 Oct 2020 • Esmaeil Seraj, Xiyang Wu, Matthew Gombolay
The FireCommander environment can be useful for research topics spanning a wide range of applications from Reinforcement Learning (RL) and Learning from Demonstration (LfD), to Coordination, Psychology, Human-Robot Interaction (HRI) and Teaming.