Search Results for author: Chenming Zhu

Found 9 papers, 5 papers with code

LLaVA-3D: A Simple yet Effective Pathway to Empowering LMMs with 3D-awareness

no code implementations26 Sep 2024 Chenming Zhu, Tai Wang, Wenwei Zhang, Jiangmiao Pang, Xihui Liu

Recent advancements in Large Multimodal Models (LMMs) have greatly enhanced their proficiency in 2D visual understanding tasks, enabling them to effectively process and understand images and videos.

Scene Understanding

ScanReason: Empowering 3D Visual Grounding with Reasoning Capabilities

no code implementations1 Jul 2024 Chenming Zhu, Tai Wang, Wenwei Zhang, Kai Chen, Xihui Liu

Although great progress has been made in 3D visual grounding, current models still rely on explicit textual descriptions for grounding and lack the ability to reason human intentions from implicit instructions.

3D visual grounding Language Modeling +2

MMScan: A Multi-Modal 3D Scene Dataset with Hierarchical Grounded Language Annotations

1 code implementation13 Jun 2024 Ruiyuan Lyu, Tai Wang, Jingli Lin, Shuai Yang, Xiaohan Mao, Yilun Chen, Runsen Xu, Haifeng Huang, Chenming Zhu, Dahua Lin, Jiangmiao Pang

With the emergence of LLMs and their integration with other data modalities, multi-modal 3D perception attracts more attention due to its connectivity to the physical world and makes rapid progress.

3D visual grounding Attribute +1

An Empirical Study of Training State-of-the-Art LiDAR Segmentation Models

1 code implementation23 May 2024 Jiahao Sun, Chunmei Qing, Xiang Xu, Lingdong Kong, Youquan Liu, Li Li, Chenming Zhu, Jingwei Zhang, Zeqi Xiao, Runnan Chen, Tai Wang, Wenwei Zhang, Kai Chen

In the rapidly evolving field of autonomous driving, precise segmentation of LiDAR data is crucial for understanding complex 3D environments.

Autonomous Driving Benchmarking +3

Object2Scene: Putting Objects in Context for Open-Vocabulary 3D Detection

no code implementations18 Sep 2023 Chenming Zhu, Wenwei Zhang, Tai Wang, Xihui Liu, Kai Chen

Instead of leveraging 2D images, we propose Object2Scene, the first approach that leverages large-scale large-vocabulary 3D object datasets to augment existing 3D scene datasets for open-vocabulary 3D object detection.

3D Object Detection 3D Open-Vocabulary Object Detection +4

MV-FCOS3D++: Multi-View Camera-Only 4D Object Detection with Pretrained Monocular Backbones

1 code implementation26 Jul 2022 Tai Wang, Qing Lian, Chenming Zhu, Xinge Zhu, Wenwei Zhang

In this technical report, we present our solution, dubbed MV-FCOS3D++, for the Camera-Only 3D Detection track in Waymo Open Dataset Challenge 2022.

object-detection Object Detection +1

SharpContour: A Contour-based Boundary Refinement Approach for Efficient and Accurate Instance Segmentation

no code implementations CVPR 2022 Chenming Zhu, Xuanye Zhang, Yanran Li, Liangdong Qiu, Kai Han, Xiaoguang Han

Contour-based models are efficient and generic to be incorporated with any existing segmentation methods, but they often generate over-smoothed contour and tend to fail on corner areas.

Instance Segmentation Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.