Search Results for author: Runpei Dong

Found 22 papers, 14 papers with code

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

1 code implementation7 Jul 2025 Wenyao Zhang, Hongsi Liu, Zekun Qi, Yunnan Wang, Xinqiang Yu, Jiazhao Zhang, Runpei Dong, JiaWei He, He Wang, Zhizheng Zhang, Li Yi, Wenjun Zeng, Xin Jin

However, existing methods are limited to challenging image-based forecasting, which suffers from redundant information and lacks comprehensive and critical world knowledge, including dynamic, spatial and semantic information.

Image Generation Multimodal Reasoning +3

AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time

no code implementations30 May 2025 Junyu Zhang, Runpei Dong, Han Wang, Xuying Ning, Haoran Geng, Peihao Li, Xialin He, Yutong Bai, Jitendra Malik, Saurabh Gupta, huan zhang

$\alpha$1 first introduces $\alpha$ moment, which represents the scaled thinking phase with a universal parameter $\alpha$.

Answer Generation

Perception in Reflection

no code implementations9 Apr 2025 Yana Wei, Liang Zhao, Kangheng Lin, En Yu, Yuang Peng, Runpei Dong, Jianjian Sun, Haoran Wei, Zheng Ge, Xiangyu Zhang, Vishal M. Patel

We present a perception in reflection paradigm designed to transcend the limitations of current large vision-language models (LVLMs), which are expected yet often fail to achieve perfect perception initially.

Hallucination

Learning Getting-Up Policies for Real-World Humanoid Robots

1 code implementation17 Feb 2025 Xialin He, Runpei Dong, Zixuan Chen, Saurabh Gupta

Hand-designing controllers for getting up is difficult because of the varied configurations a humanoid can end up in after a fall and the challenging terrains humanoid robots are expected to operate on.

Taming Teacher Forcing for Masked Autoregressive Video Generation

no code implementations CVPR 2025 Deyu Zhou, Quan Sun, Yuang Peng, Kun Yan, Runpei Dong, Duomin Wang, Zheng Ge, Nan Duan, Xiangyu Zhang, Lionel M. Ni, Heung-Yeung Shum

We introduce MAGI, a hybrid video generation framework that combines masked modeling for intra-frame generation with causal modeling for next-frame generation.

Video Generation Video Prediction

Positional Prompt Tuning for Efficient 3D Representation Learning

1 code implementation21 Aug 2024 Shaochen Zhang, Zekun Qi, Runpei Dong, Xiuxiu Bai, Xing Wei

Together with the sequential Transformer, the whole module with position encoding comprehensively constructs a multi-scale feature abstraction module that considers both the local parts from the patch and the global parts from center points as position encoding.

3D Parameter-Efficient Fine-Tuning for Classification 3D Point Cloud Classification +4

DreamBench++: A Human-Aligned Benchmark for Personalized Image Generation

1 code implementation24 Jun 2024 Yuang Peng, Yuxin Cui, Haomiao Tang, Zekun Qi, Runpei Dong, Jing Bai, Chunrui Han, Zheng Ge, Xiangyu Zhang, Shu-Tao Xia

Personalized image generation holds great promise in assisting humans in everyday work and life due to its impressive function in creatively generating personalized content.

Benchmarking Image Generation +1

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

3 code implementations27 Feb 2024 Zekun Qi, Runpei Dong, Shaochen Zhang, Haoran Geng, Chunrui Han, Zheng Ge, Li Yi, Kaisheng Ma

This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM) designed for embodied interaction, exploring a universal 3D object understanding with 3D point clouds and languages.

3D geometry 3D Object Captioning +14

DreamLLM: Synergistic Multimodal Comprehension and Creation

1 code implementation20 Sep 2023 Runpei Dong, Chunrui Han, Yuang Peng, Zekun Qi, Zheng Ge, Jinrong Yang, Liang Zhao, Jianjian Sun, HongYu Zhou, Haoran Wei, Xiangwen Kong, Xiangyu Zhang, Kaisheng Ma, Li Yi

This paper presents DreamLLM, a learning framework that first achieves versatile Multimodal Large Language Models (MLLMs) empowered with frequently overlooked synergy between multimodal comprehension and creation.

multimodal generation Visual Question Answering +2

VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation

2 code implementations NeurIPS 2023 Zekun Qi, Muzhou Yu, Runpei Dong, Kaisheng Ma

VPP leverages structured voxel representation in the proposed Voxel Semantic Generator and the sparsity of unstructured point representation in the Point Upsampler, enabling efficient generation of multi-category objects.

3D Generation 8k

ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning

no code implementations18 Jul 2023 Liang Zhao, En Yu, Zheng Ge, Jinrong Yang, Haoran Wei, HongYu Zhou, Jianjian Sun, Yuang Peng, Runpei Dong, Chunrui Han, Xiangyu Zhang

Based on precise referring instruction, we propose ChatSpot, a unified end-to-end multimodal large language model that supports diverse forms of interactivity including mouse clicks, drag-and-drop, and drawing boxes, which provides a more flexible and seamless interactive experience.

Instruction Following Language Modeling +3

CORSD: Class-Oriented Relational Self Distillation

no code implementations28 Apr 2023 Muzhou Yu, Sia Huat Tan, Kailu Wu, Runpei Dong, Linfeng Zhang, Kaisheng Ma

Knowledge distillation conducts an effective model compression method while holding some limitations:(1) the feature based distillation methods only focus on distilling the feature map but are lack of transferring the relation of data examples; (2) the relational distillation methods are either limited to the handcrafted functions for relation extraction, such as L2 norm, or weak in inter- and intra- class relation modeling.

Knowledge Distillation Model Compression +2

Exploring Recurrent Long-term Temporal Fusion for Multi-view 3D Perception

no code implementations10 Mar 2023 Chunrui Han, Jinrong Yang, Jianjian Sun, Zheng Ge, Runpei Dong, HongYu Zhou, Weixin Mao, Yuang Peng, Xiangyu Zhang

In this paper, we explore an embarrassingly simple long-term recurrent fusion strategy built upon the LSS-based methods and find it already able to enjoy the merits from both sides, i. e., rich long-term information and efficient fusion pipeline.

motion prediction object-detection +1

CLIP-FO3D: Learning Free Open-world 3D Scene Representations from 2D Dense CLIP

no code implementations8 Mar 2023 Junbo Zhang, Runpei Dong, Kaisheng Ma

Training a 3D scene understanding model requires complicated human annotations, which are laborious to collect and result in a model only encoding close-set object semantics.

Scene Understanding Semantic Segmentation

Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining

5 code implementations5 Feb 2023 Zekun Qi, Runpei Dong, Guofan Fan, Zheng Ge, Xiangyu Zhang, Kaisheng Ma, Li Yi

This motivates us to learn 3D representations by sharing the merits of both paradigms, which is non-trivial due to the pattern difference between the two paradigms.

3D Point Cloud Linear Classification Decoder +3

Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?

4 code implementations16 Dec 2022 Runpei Dong, Zekun Qi, Linfeng Zhang, Junbo Zhang, Jianjian Sun, Zheng Ge, Li Yi, Kaisheng Ma

The success of deep learning heavily relies on large-scale data with comprehensive labels, which is more expensive and time-consuming to fetch in 3D compared to 2D images or natural languages.

Few-Shot 3D Point Cloud Classification Knowledge Distillation +1

Contrastive Deep Supervision

1 code implementation12 Jul 2022 Linfeng Zhang, Xin Chen, Junbo Zhang, Runpei Dong, Kaisheng Ma

The success of deep learning is usually accompanied by the growth in neural network depth.

Contrastive Learning Fine-Grained Image Classification +4

Region-aware Knowledge Distillation for Efficient Image-to-Image Translation

no code implementations25 May 2022 Linfeng Zhang, Xin Chen, Runpei Dong, Kaisheng Ma

In this paper, we propose Region-aware Knowledge Distillation ReKo to compress image-to-image translation models.

Contrastive Learning image-classification +2

PointDistiller: Structured Knowledge Distillation Towards Efficient and Compact 3D Detection

1 code implementation CVPR 2023 Linfeng Zhang, Runpei Dong, Hung-Shuo Tai, Kaisheng Ma

The remarkable breakthroughs in point cloud representation learning have boosted their usage in real-world applications such as self-driving cars and virtual reality.

3D Object Detection Knowledge Distillation +4

Finding the Task-Optimal Low-Bit Sub-Distribution in Deep Neural Networks

1 code implementation30 Dec 2021 Runpei Dong, Zhanhong Tan, Mengdi Wu, Linfeng Zhang, Kaisheng Ma

Besides, an efficient deployment flow for the mobile CPU is developed, achieving up to 7. 46$\times$ inference acceleration on an octa-core ARM CPU.

CPU image-classification +5

Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention

1 code implementation3 Nov 2021 Sia Huat Tan, Runpei Dong, Kaisheng Ma

Inspired by this observation, we propose an end-to-end trainable Multi-Glimpse Network (MGNet) which aims to tackle the challenges of high computation and the lack of robustness based on recurrent downsampled attention mechanism.

Cannot find the paper you are looking for? You can Submit a new open access paper.