1 code implementation • 18 Feb 2025 • Zekun Qi, Wenyao Zhang, Yufei Ding, Runpei Dong, Xinqiang Yu, Jingwen Li, Lingyun Xu, Baoyu Li, Xialin He, Guofan Fan, Jiazhao Zhang, JiaWei He, Jiayuan Gu, Xin Jin, Kaisheng Ma, Zhizheng Zhang, He Wang, Li Yi
Spatial intelligence is a critical component of embodied AI, promoting robots to understand and interact with their environments.
Ranked #1 on
Spatial Reasoning
on EmbSpatial-Bench
1 code implementation • 21 Aug 2024 • Shaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei
Together with the sequential Transformer, the whole module with position encoding comprehensively constructs a multi-scale feature abstraction module that considers both the local parts from the patch and the global parts from center points as position encoding.
3D Parameter-Efficient Fine-Tuning for Classification
3D Point Cloud Classification
+4
1 code implementation • 24 Jun 2024 • Yuang Peng, Yuxin Cui, Haomiao Tang, Zekun Qi, Runpei Dong, Jing Bai, Chunrui Han, Zheng Ge, Xiangyu Zhang, Shu-Tao Xia
Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content.
3 code implementations • 27 Feb 2024 • Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi, Kaisheng Ma
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.
Ranked #1 on
3D Question Answering (3D-QA)
on 3D MM-Vet
1 code implementation • 20 Sep 2023 • Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi
This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.
Ranked #5 on
Visual Question Answering
on MMBench
2 code implementations • NeurIPS 2023 • Zekun Qi, Muzhou Yu, Runpei Dong, Kaisheng Ma
VPP leverages structured voxel representation in the proposed Voxel Semantic Generator and the sparsity of unstructured point representation in the Point Upsampler, enabling efficient generation of multi-category objects.
1 code implementation • 31 May 2023 • Guofan Fan, Zekun Qi, Wenkai Shi, Kaisheng Ma
Geometry and color information provided by the point clouds are both crucial for 3D scene understanding.
Ranked #1 on
Unsupervised 3D Semantic Segmentation
on ScanNetV2
5 code implementations • 5 Feb 2023 • Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, Li Yi
This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms.
Ranked #1 on
Zero-Shot Transfer 3D Point Cloud Classification
on ModelNet10
(using extra training data)
4 code implementations • 16 Dec 2022 • Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, Kaisheng Ma
The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.
Ranked #8 on
Few-Shot 3D Point Cloud Classification
on ModelNet40 10-way (10-shot)
(using extra training data)
Few-Shot 3D Point Cloud Classification
Knowledge Distillation
+1