Search Results for author: Shengyi Qian

Found 16 papers, 8 papers with code

Multi-Object Hallucination in Vision-Language Models

no code implementations8 Jul 2024 Xuweiyi Chen, Ziqiao Ma, Xuejun Zhang, Sihan Xu, Shengyi Qian, Jianing Yang, David F. Fouhey, Joyce Chai

Large vision language models (LVLMs) often suffer from object hallucination, producing objects not present in the given images.

Hallucination Object

3D-MVP: 3D Multiview Pretraining for Robotic Manipulation

no code implementations26 Jun 2024 Shengyi Qian, Kaichun Mo, Valts Blukis, David F. Fouhey, Dieter Fox, Ankit Goyal

Our results suggest that 3D-aware pretraining is a promising approach to improve sample efficiency and generalization of vision-based robotic manipulation policies.

Decoder Robot Manipulation +1

Multimodal Graph Benchmark

1 code implementation24 Jun 2024 Jing Zhu, YuHang Zhou, Shengyi Qian, Zhongmou He, Tong Zhao, Neil Shah, Danai Koutra

Associating unstructured data with structured information is crucial for real-world tasks that require relevance search.

Graph Learning

3D-GRAND: A Million-Scale Dataset for 3D-LLMs with Better Grounding and Less Hallucination

1 code implementation7 Jun 2024 Jianing Yang, Xuweiyi Chen, Nikhil Madaan, Madhavan Iyengar, Shengyi Qian, David F. Fouhey, Joyce Chai

The integration of language and 3D perception is crucial for developing embodied agents and robots that comprehend and interact with the physical world.

Hallucination

LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent

1 code implementation21 Sep 2023 Jianing Yang, Xuweiyi Chen, Shengyi Qian, Nikhil Madaan, Madhavan Iyengar, David F. Fouhey, Joyce Chai

While existing approaches often rely on extensive labeled data or exhibit limitations in handling complex language queries, we propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipeline.

3D visual grounding Language Modelling +3

Pitfalls in Link Prediction with Graph Neural Networks: Understanding the Impact of Target-link Inclusion & Better Practices

no code implementations1 Jun 2023 Jing Zhu, YuHang Zhou, Vassilis N. Ioannidis, Shengyi Qian, Wei Ai, Xiang Song, Danai Koutra

While Graph Neural Networks (GNNs) are remarkably successful in a variety of high-impact applications, we demonstrate that, in link prediction, the common practices of including the edges being predicted in the graph at training and/or test have outsized impact on the performance of low-degree nodes.

Link Prediction Node Classification

Understanding 3D Object Interaction from a Single Image

1 code implementation ICCV 2023 Shengyi Qian, David F. Fouhey

Humans can easily understand a single image as depicting multiple potential objects permitting interaction.

Object

Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation

2 code implementations ICCV 2023 Ziyang Chen, Shengyi Qian, Andrew Owens

In this paper, we use these cues to solve a problem we call Sound Localization from Motion (SLfM): jointly estimating camera rotation and localizing sound sources.

Understanding 3D Object Articulation in Internet Videos

no code implementations CVPR 2022 Shengyi Qian, Linyi Jin, Chris Rockwell, Siyi Chen, David F. Fouhey

We propose to investigate detecting and characterizing the 3D planar articulation of objects from ordinary videos.

Object

Planar Surface Reconstruction from Sparse Views

1 code implementation ICCV 2021 Linyi Jin, Shengyi Qian, Andrew Owens, David F. Fouhey

The paper studies planar surface reconstruction of indoor scenes from two views with unknown camera poses.

Surface Reconstruction

Associative3D: Volumetric Reconstruction from Sparse Views

1 code implementation ECCV 2020 Shengyi Qian, Linyi Jin, David F. Fouhey

This information is then jointly reasoned over to produce the most likely explanation of the scene.

3D Volumetric Reconstruction

OASIS: A Large-Scale Dataset for Single Image 3D in the Wild

no code implementations CVPR 2020 Weifeng Chen, Shengyi Qian, David Fan, Noriyuki Kojima, Max Hamilton, Jia Deng

Single-view 3D is the task of recovering 3D properties such as depth and surface normals from a single image.

3D geometry

Cannot find the paper you are looking for? You can Submit a new open access paper.