Search Results for author: Chunwei Wang

Found 15 papers, 2 papers with code

ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement

no code implementations2 Apr 2025 Runhui Huang, Chunwei Wang, Junwei Yang, Guansong Lu, Yunlong Yuan, Jianhua Han, Lu Hou, Wei zhang, Lanqing Hong, Hengshuang Zhao, Hang Xu

We present ILLUME+ that leverages dual visual tokenization and a diffusion decoder to improve both deep semantic understanding and high-fidelity image generation.

Decoder Image Generation +1

SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation

no code implementations9 Mar 2025 Zisheng Chen, Chunwei Wang, Xiuwei Chen, Hang Xu, Jianhua Han, Xiandan Liang

We present SemHiTok, a unified image Tokenizer via Semantic-Guided Hierarchical codebook that provides consistent discrete feature representations for multimodal understanding and generation tasks.

Brick-Diffusion: Generating Long Videos with Brick-to-Wall Denoising

no code implementations6 Jan 2025 Yunlong Yuan, Yuanfan Guo, Chunwei Wang, Hang Xu, Li Zhang

However, training models for long video generation demands significant computational power and extensive data, leading most video diffusion models to be limited to a small number of frames.

Denoising Video Generation

ILLUME: Illuminating Your LLMs to See, Draw, and Self-Enhance

no code implementations9 Dec 2024 Chunwei Wang, Guansong Lu, Junwei Yang, Runhui Huang, Jianhua Han, Lu Hou, Wei zhang, Hang Xu

In this paper, we introduce ILLUME, a unified multimodal large language model (MLLM) that seamlessly integrates multimodal understanding and generation capabilities within a single large language model through a unified next-token prediction formulation.

Image Generation Language Modeling +4

UNIT: Unifying Image and Text Recognition in One Vision Encoder

no code implementations6 Sep 2024 Yi Zhu, Yanpeng Zhou, Chunwei Wang, Yang Cao, Jianhua Han, Lu Hou, Hang Xu

Starting with a vision encoder pre-trained with image recognition tasks, UNIT introduces a lightweight language decoder for predicting text outputs and a lightweight vision decoder to prevent catastrophic forgetting of the original image encoding capabilities.

Decoder Optical Character Recognition (OCR)

HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models

no code implementations11 Jul 2024 Runhui Huang, Xinpeng Ding, Chunwei Wang, Jianhua Han, Yulong Liu, Hengshuang Zhao, Hang Xu, Lu Hou, Wei zhang, Xiaodan Liang

High-resolution inputs enable Large Vision-Language Models (LVLMs) to discern finer visual details, enhancing their comprehension capabilities.

Position

Reason2Drive: Towards Interpretable and Chain-based Reasoning for Autonomous Driving

1 code implementation6 Dec 2023 Ming Nie, Renyuan Peng, Chunwei Wang, Xinyue Cai, Jianhua Han, Hang Xu, Li Zhang

Large vision-language models (VLMs) have garnered increasing interest in autonomous driving areas, due to their advanced capabilities in complex reasoning tasks essential for highly autonomous vehicle behavior.

Autonomous Driving Decision Making

Gaining Wisdom from Setbacks: Aligning Large Language Models via Mistake Analysis

no code implementations16 Oct 2023 Kai Chen, Chunwei Wang, Kuo Yang, Jianhua Han, Lanqing Hong, Fei Mi, Hang Xu, Zhengying Liu, Wenyong Huang, Zhenguo Li, Dit-yan Yeung, Lifeng Shang, Xin Jiang, Qun Liu

The rapid development of large language models (LLMs) has not only provided numerous opportunities but also presented significant challenges.

Instruction Following

PointAugmenting: Cross-Modal Augmentation for 3D Object Detection

no code implementations CVPR 2021 Chunwei Wang, Chao Ma, Ming Zhu, Xiaokang Yang

On one hand, PointAugmenting decorates point clouds with corresponding point-wise CNN features extracted by pretrained 2D detection models, and then performs 3D object detection over the decorated point clouds.

3D Object Detection Autonomous Driving +4

Cannot find the paper you are looking for? You can Submit a new open access paper.