no code implementations • 5 Jul 2022 • Zhihao Yuan, Xu Yan, Zhuo Li, Xuhao Li, Yao Guo, Shuguang Cui, Zhen Li
Recent progress in 3D scene understanding has explored visual grounding (3DVG) to localize a target object through a language description.
Object Representation Learning +3