3D visual grounding

39 papers with code • 0 benchmarks • 2 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance

ivan-tang-3d/viewrefer3d 29 Mar 2023

In this paper, we propose ViewRefer, a multi-view framework for 3D visual grounding exploring how to grasp the view knowledge from both text and 3D modalities.

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

yanmin-wu/eda CVPR 2023

3D visual grounding aims to find the object within point clouds mentioned by free-form natural language descriptions with rich semantic cues.

InstanceRefer: Cooperative Holistic Understanding for Visual Grounding on Point Clouds through Instance Multi-level Contextual Referring

CurryYuan/InstanceRefer ICCV 2021

Compared with the visual grounding on 2D images, the natural-language-guided 3D object localization on point clouds is more challenging.

SAT: 2D Semantics Assisted Training for 3D Visual Grounding

zyang-ur/SAT ICCV 2021

3D visual grounding aims at grounding a natural language description about a 3D scene, usually represented in the form of 3D point clouds, to the targeted object region.

Multi-View Transformer for 3D Visual Grounding

sega-hsj/mvt-3dvg CVPR 2022

The multi-view space enables the network to learn a more robust multi-modal representation for 3D visual grounding and eliminates the dependence on specific views.

3D-SPS: Single-Stage 3D Visual Grounding via Referred Point Progressive Selection

fjhzhixi/3d-sps CVPR 2022

3D visual grounding aims to locate the referred target object in 3D point cloud scenes according to a free-form language description.

Learning Point-Language Hierarchical Alignment for 3D Visual Grounding

ppjmchen/ham 22 Oct 2022

This paper presents a novel hierarchical alignment model (HAM) that learns multi-granularity visual and linguistic representations in an end-to-end manner.

Look Around and Refer: 2D Synthetic Semantics Knowledge Distillation for 3D Visual Grounding

eslambakr/LAR-Look-Around-and-Refer 25 Nov 2022

The main question we address in this paper is "can we consolidate the 3D visual stream by 2D clues synthesized from point clouds and efficiently utilize them in training and testing?".

Context-Aware Alignment and Mutual Masking for 3D-Language Pre-Training

leolyj/3d-vlp CVPR 2023

The current approaches for 3D visual reasoning are task-specific, and lack pre-training methods to learn generic representations that can transfer across various tasks.