Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs.
Ranked #1 on Visual Question Answering (VQA) on HallusionBench
Our approach for intent-aware planning, iPLAN, allows agents to infer nearby drivers' intents solely from their local observations.
We evaluate our approach on datasets consisting of both ground camera videos and aerial videos, and scenes with single-agent and multi-agent actions.
Ranked #1 on Action Recognition on Okutama-Action
We present CrossLoc3D, a novel 3D place recognition method that solves a large-scale point matching problem in a cross-source setting.
Ranked #1 on 3D Place Recognition on Oxford RobotCar Dataset
We propose a novel approach for aerial video action recognition.
Ranked #1 on Action Recognition on RoCoG-v2
We present a visual and inertial-based terrain classification network (VINet) for robotic navigation over different traversable surfaces.
Our formulation uses a novel Fourier object disentanglement method to innately separate out the human agent (which is typically small) from the background.
Ranked #1 on Action Recognition on UAV Human
We present a novel architecture for 3D object detection, M3DeTR, which combines different point cloud representations (raw, voxels, bird-eye view) with different feature scales based on multi-scale feature pyramids.
Ranked #1 on 3D Object Detection on KITTI Cars Hard val
We interface GANav with a deep reinforcement learning-based navigation algorithm and highlight its benefits in terms of navigation in real-world unstructured terrains.
Ranked #1 on Semantic Segmentation on RUGD
In practice, our approach reduces the average prediction error by more than 54% over prior algorithms and achieves a weighted average accuracy of 91. 2% for behavior prediction.
Ranked #1 on Trajectory Prediction on ApolloScape