Search Results for author: Runsen Xu

Found 10 papers, 9 papers with code

VLM-Grounder: A VLM Agent for Zero-Shot 3D Visual Grounding

1 code implementation17 Oct 2024 Runsen Xu, Zhiwei Huang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin

3D visual grounding is crucial for robots, requiring integration of natural language and 3D scene understanding.

3D geometry 3D visual grounding +2

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

1 code implementation13 Jun 2024 Ruiyuan Lyu, Tai Wang, Jingli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang, Chenming Zhu, Dahua Lin, Jiangmiao Pang

With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress.

3D visual grounding Attribute +1

Grounded 3D-LLM with Referent Tokens

1 code implementation16 May 2024 Yilun Chen, Shuai Yang, Haifeng Huang, Tai Wang, Runsen Xu, Ruiyuan Lyu, Dahua Lin, Jiangmiao Pang

To facilitate the use of referent tokens in subsequent language modeling, we provide a large-scale, automatically curated grounded scene-text dataset with over 1 million phrase-to-region correspondences and introduce Contrastive Language-Scene Pre-training (CLASP) to perform phrase-level scene-text alignment using this data.

Dense Captioning Diversity +7

PointLLM: Empowering Large Language Models to Understand Point Clouds

3 code implementations31 Aug 2023 Runsen Xu, Xiaolong Wang, Tai Wang, Yilun Chen, Jiangmiao Pang, Dahua Lin

The unprecedented advancements in Large Language Models (LLMs) have shown a profound impact on natural language processing but are yet to fully embrace the realm of 3D understanding.

3D Object Captioning 3D Question Answering (3D-QA) +3

MV-JAR: Masked Voxel Jigsaw and Reconstruction for LiDAR-Based Self-Supervised Pre-Training

1 code implementation CVPR 2023 Runsen Xu, Tai Wang, Wenwei Zhang, Runjian Chen, Jinkun Cao, Jiangmiao Pang, Dahua Lin

This paper introduces the Masked Voxel Jigsaw and Reconstruction (MV-JAR) method for LiDAR-based self-supervised pre-training and a carefully designed data-efficient 3D object detection benchmark on the Waymo dataset.

3D Object Detection object-detection

CO^3: Cooperative Unsupervised 3D Representation Learning for Autonomous Driving

1 code implementation8 Jun 2022 Runjian Chen, Yao Mu, Runsen Xu, Wenqi Shao, Chenhan Jiang, Hang Xu, Zhenguo Li, Ping Luo

In this paper, we propose CO^3, namely Cooperative Contrastive Learning and Contextual Shape Prediction, to learn 3D representation for outdoor-scene point clouds in an unsupervised manner.

Autonomous Driving Contrastive Learning +1

LIFE: Lighting Invariant Flow Estimation

no code implementations7 Apr 2021 Zhaoyang Huang, Xiaokun Pan, Runsen Xu, Yan Xu, Ka Chun Cheung, Guofeng Zhang, Hongsheng Li

However, local image contents are inevitably ambiguous and error-prone during the cross-image feature matching process, which hinders downstream tasks.

Cannot find the paper you are looking for? You can Submit a new open access paper.