1 code implementation • 7 Jul 2025 • Wenyao Zhang, Hongsi Liu, Zekun Qi, Yunnan Wang, Xinqiang Yu, Jiazhao Zhang, Runpei Dong, JiaWei He, He Wang, Zhizheng Zhang, Li Yi, Wenjun Zeng, Xin Jin
However, existing methods are limited to challenging image-based forecasting, which suffers from redundant information and lacks comprehensive and critical world knowledge, including dynamic, spatial and semantic information.
Ranked #1 on
Robot Manipulation
on CALVIN
no code implementations • 30 May 2025 • Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, huan zhang
$\alpha$1 first introduces $\alpha$ moment, which represents the scaled thinking phase with a universal parameter $\alpha$.
no code implementations • 9 Apr 2025 • Yana Wei, Liang Zhao, Kangheng Lin, En Yu, Yuang Peng, Runpei Dong, Jianjian Sun, Haoran Wei, Zheng Ge, Xiangyu Zhang, Vishal M. Patel
We present a perception in reflection paradigm designed to transcend the limitations of current large vision-language models (LVLMs), which are expected yet often fail to achieve perfect perception initially.
2 code implementations • 18 Feb 2025 • Zekun Qi, Wenyao Zhang, Yufei Ding, Runpei Dong, Xinqiang Yu, Jingwen Li, Lingyun Xu, Baoyu Li, Xialin He, Guofan Fan, Jiazhao Zhang, JiaWei He, Jiayuan Gu, Xin Jin, Kaisheng Ma, Zhizheng Zhang, He Wang, Li Yi
Spatial intelligence is a critical component of embodied AI, promoting robots to understand and interact with their environments.
Ranked #1 on
Spatial Reasoning
on EmbSpatial-Bench
1 code implementation • 17 Feb 2025 • Xialin He, Runpei Dong, Zixuan Chen, Saurabh Gupta
Hand-designing controllers for getting up is difficult because of the varied configurations a humanoid can end up in after a fall and the challenging terrains humanoid robots are expected to operate on.
no code implementations • CVPR 2025 • Deyu Zhou, Quan Sun, Yuang Peng, Kun Yan, Runpei Dong, Duomin Wang, Zheng Ge, Nan Duan, Xiangyu Zhang, Lionel M. Ni, Heung-Yeung Shum
We introduce MAGI, a hybrid video generation framework that combines masked modeling for intra-frame generation with causal modeling for next-frame generation.
1 code implementation • 21 Aug 2024 • Shaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei
Together with the sequential Transformer, the whole module with position encoding comprehensively constructs a multi-scale feature abstraction module that considers both the local parts from the patch and the global parts from center points as position encoding.
3D Parameter-Efficient Fine-Tuning for Classification
3D Point Cloud Classification
+4
1 code implementation • 24 Jun 2024 • Yuang Peng, Yuxin Cui, Haomiao Tang, Zekun Qi, Runpei Dong, Jing Bai, Chunrui Han, Zheng Ge, Xiangyu Zhang, Shu-Tao Xia
Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content.
3 code implementations • 27 Feb 2024 • Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi, Kaisheng Ma
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.
Ranked #1 on
3D Question Answering (3D-QA)
on 3D MM-Vet
1 code implementation • 20 Sep 2023 • Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi
This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.
Ranked #5 on
Visual Question Answering
on MMBench
2 code implementations • NeurIPS 2023 • Zekun Qi, Muzhou Yu, Runpei Dong, Kaisheng Ma
VPP leverages structured voxel representation in the proposed Voxel Semantic Generator and the sparsity of unstructured point representation in the Point Upsampler, enabling efficient generation of multi-category objects.
no code implementations • 18 Jul 2023 • Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang
Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.
no code implementations • 28 Apr 2023 • Muzhou Yu, Sia Huat Tan, Kailu Wu, Runpei Dong, Linfeng Zhang, Kaisheng Ma
Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data examples; (2) the relational distillation methods are either limited to the handcrafted functions for relation extraction, such as L2 norm, or weak in inter- and intra- class relation modeling.
no code implementations • 10 Mar 2023 • Chunrui Han, Jinrong Yang, Jianjian Sun, Zheng Ge, Runpei Dong, HongYu Zhou, Weixin Mao, Yuang Peng, Xiangyu Zhang
In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i. e., rich long-term information and efficient fusion pipeline.
no code implementations • 8 Mar 2023 • Junbo Zhang, Runpei Dong, Kaisheng Ma
Training a 3D scene understanding model requires complicated human annotations, which are laborious to collect and result in a model only encoding close-set object semantics.
5 code implementations • 5 Feb 2023 • Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, Li Yi
This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms.
Ranked #1 on
Zero-Shot Transfer 3D Point Cloud Classification
on ModelNet10
(using extra training data)
4 code implementations • 16 Dec 2022 • Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, Kaisheng Ma
The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.
Ranked #8 on
Few-Shot 3D Point Cloud Classification
on ModelNet40 10-way (10-shot)
(using extra training data)
Few-Shot 3D Point Cloud Classification
Knowledge Distillation
+1
1 code implementation • 12 Jul 2022 • Linfeng Zhang, Xin Chen, Junbo Zhang, Runpei Dong, Kaisheng Ma
The success of deep learning is usually accompanied by the growth in neural network depth.
no code implementations • 25 May 2022 • Linfeng Zhang, Xin Chen, Runpei Dong, Kaisheng Ma
In this paper, we propose Region-aware Knowledge Distillation ReKo to compress image-to-image translation models.
1 code implementation • CVPR 2023 • Linfeng Zhang, Runpei Dong, Hung-Shuo Tai, Kaisheng Ma
The remarkable breakthroughs in point cloud representation learning have boosted their usage in real-world applications such as self-driving cars and virtual reality.
1 code implementation • 30 Dec 2021 • Runpei Dong, Zhanhong Tan, Mengdi Wu, Linfeng Zhang, Kaisheng Ma
Besides, an efficient deployment flow for the mobile CPU is developed, achieving up to 7. 46$\times$ inference acceleration on an octa-core ARM CPU.
1 code implementation • 3 Nov 2021 • Sia Huat Tan, Runpei Dong, Kaisheng Ma
Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high computation and the lack of robustness based on recurrent downsampled attention mechanism.