Search Results for author: Canyu Zhao

Found 6 papers, 4 papers with code

Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO

no code implementations27 May 2025 Muzhi Zhu, Hao Zhong, Canyu Zhao, Zongze Du, Zheng Huang, MingYu Liu, Hao Chen, Cheng Zou, Jingdong Chen, Ming Yang, Chunhua Shen

However, despite the importance of active perception in embodied intelligence, there is little to no exploration of how MLLMs can be equipped with or learn active perception capabilities.

Autonomous Driving Decision Making +2

Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

1 code implementation26 May 2025 Hao Zhong, Muzhi Zhu, Zongze Du, Zheng Huang, Canyu Zhao, MingYu Liu, Wen Wang, Hao Chen, Chunhua Shen

Long-horizon video-audio reasoning and fine-grained pixel understanding impose conflicting requirements on omnimodal models: dense temporal coverage demands many low-resolution frames, whereas precise grounding calls for high-resolution inputs.

Domain Generalization Hallucination +6

DICEPTION: A Generalist Diffusion Model for Visual Perceptual Tasks

1 code implementation24 Feb 2025 Canyu Zhao, MingYu Liu, Huanyi Zheng, Muzhi Zhu, Zhiyue Zhao, Hao Chen, Tong He, Chunhua Shen

We achieve results on par with SAM-vit-h using only 0. 06% of their data (e. g., 600K vs. 1B pixel-level annotated images).

Conditional Image Generation Semantic Segmentation

MovieDreamer: Hierarchical Generation for Coherent Long Visual Sequence

no code implementations23 Jul 2024 Canyu Zhao, MingYu Liu, Wen Wang, Weihua Chen, Fan Wang, Hao Chen, Bo Zhang, Chunhua Shen

Our approach utilizes autoregressive models for global narrative coherence, predicting sequences of visual tokens that are subsequently transformed into high-quality video frames through diffusion rendering.

Video Generation

FreeCustom: Tuning-Free Customized Image Generation for Multi-Concept Composition

2 code implementations CVPR 2024 Ganggui Ding, Canyu Zhao, Wen Wang, Zhen Yang, Zide Liu, Hao Chen, Chunhua Shen

Experiments show that our method's produced images are consistent with the given concepts and better aligned with the input text.

Image Generation

AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort

1 code implementation19 Nov 2023 Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, Chunhua Shen

We empirically find that sparse control conditions, such as bounding boxes, are suitable for layout planning, while dense control conditions, e. g., sketches and keypoints, are suitable for generating high-quality image content.

Image Generation Story Visualization

Cannot find the paper you are looking for? You can Submit a new open access paper.