no code implementations • 8 Jul 2024 • Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai
Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images.
no code implementations • 26 Jun 2024 • Shengyi Qian, Kaichun Mo, Valts Blukis, David F. Fouhey, Dieter Fox, Ankit Goyal
Our results suggest that 3D-aware pretraining is a promising approach to improve sample efficiency and generalization of vision-based robotic manipulation policies.
1 code implementation • 24 Jun 2024 • Jing Zhu, YuHang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra
Associating unstructured data with structured information is crucial for real-world tasks that require relevance search.
1 code implementation • 7 Jun 2024 • Zhongmou He, Jing Zhu, Shengyi Qian, Joyce Chai, Danai Koutra
To address the efficiency challenges at inference time, we introduce a retrieval-reranking scheme.
1 code implementation • 7 Jun 2024 • Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai
The integration of language and 3D perception is crucial for developing embodied agents and robots that comprehend and interact with the physical world.
no code implementations • 12 Jan 2024 • Shengyi Qian, Weifeng Chen, Min Bai, Xiong Zhou, Zhuowen Tu, Li Erran Li
Affordance grounding refers to the task of finding the area of an object with which one can interact.
1 code implementation • 21 Sep 2023 • Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai
While existing approaches often rely on extensive labeled data or exhibit limitations in handling complex language queries, we propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipeline.
no code implementations • 1 Jun 2023 • Jing Zhu, YuHang Zhou, Vassilis N. Ioannidis, Shengyi Qian, Wei Ai, Xiang Song, Danai Koutra
While Graph Neural Networks (GNNs) are remarkably successful in a variety of high-impact applications, we demonstrate that, in link prediction, the common practices of including the edges being predicted in the graph at training and/or test have outsized impact on the performance of low-degree nodes.
1 code implementation • ICCV 2023 • Shengyi Qian, David F. Fouhey
Humans can easily understand a single image as depicting multiple potential objects permitting interaction.
2 code implementations • ICCV 2023 • Ziyang Chen, Shengyi Qian, Andrew Owens
In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources.
no code implementations • CVPR 2022 • Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David F. Fouhey
We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos.
no code implementations • 2 Dec 2021 • Shengyi Qian, Alexander Kirillov, Nikhila Ravi, Devendra Singh Chaplot, Justin Johnson, David F. Fouhey, Georgia Gkioxari
Humans can perceive scenes in 3D from a handful of 2D views.
1 code implementation • ICCV 2021 • Linyi Jin, Shengyi Qian, Andrew Owens, David F. Fouhey
The paper studies planar surface reconstruction of indoor scenes from two views with unknown camera poses.
1 code implementation • ECCV 2020 • Shengyi Qian, Linyi Jin, David F. Fouhey
This information is then jointly reasoned over to produce the most likely explanation of the scene.
no code implementations • CVPR 2020 • Weifeng Chen, Shengyi Qian, David Fan, Noriyuki Kojima, Max Hamilton, Jia Deng
Single-view 3D is the task of recovering 3D properties such as depth and surface normals from a single image.
no code implementations • CVPR 2019 • Weifeng Chen, Shengyi Qian, Jia Deng
Depth estimation from a single image in the wild remains a challenging problem.