Search Results for author: Yuechen Zhang

Found 15 papers, 12 papers with code

Training-Free Efficient Video Generation via Dynamic Token Carving

1 code implementation22 May 2025 Yuechen Zhang, Jinbo Xing, Bin Xia, Shaoteng Liu, Bohao Peng, Xin Tao, Pengfei Wan, Eric Lo, Jiaya Jia

Despite the remarkable generation quality of video Diffusion Transformer (DiT) models, their practical deployment is severely hindered by extensive computational requirements.

Denoising Video Generation

Magic Mirror: ID-Preserved Video Generation in Video Diffusion Transformers

1 code implementation7 Jan 2025 Yuechen Zhang, Yaoyang Liu, Bin Xia, Bohao Peng, Zexin Yan, Eric Lo, Jiaya Jia

We present Magic Mirror, a framework for generating identity-preserved videos with cinematic-level quality and dynamic motion.

Diversity Text-to-Video Generation +1

DreamOmni: Unified Image Generation and Editing

no code implementations CVPR 2025 Bin Xia, Yuechen Zhang, Jingyao Li, Chengyao Wang, Yitong Wang, Xinglong Wu, Bei Yu, Jiaya Jia

We begin by analyzing existing frameworks and the requirements of downstream tasks, proposing a unified framework that integrates both T2I models and various editing tasks.

Image Generation

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

1 code implementation12 Dec 2024 Zhisheng Zhong, Chengyao Wang, Yuqi Liu, Senqiao Yang, Longxiang Tang, Yuechen Zhang, Jingyao Li, Tianyuan Qu, Yanwei Li, Yukang Chen, Shaozuo Yu, Sitong Wu, Eric Lo, Shu Liu, Jiaya Jia

As Multi-modal Large Language Models (MLLMs) evolve, expanding beyond single-domain capabilities is essential to meet the demands for more versatile and efficient AI.

EgoSchema +6

ControlNeXt: Powerful and Efficient Control for Image and Video Generation

1 code implementation12 Aug 2024 Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming-Chang Yang, Jiaya Jia

In this paper, we propose ControlNeXt: a powerful and efficient method for controllable image and video generation.

Video Generation

ResMaster: Mastering High-Resolution Image Generation via Structural and Fine-Grained Guidance

no code implementations24 Jun 2024 Shuwei Shi, Wenbo Li, Yuechen Zhang, Jingwen He, Biao Gong, Yinqiang Zheng

Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in over-smoothed content, structural distortions, and repetitive patterns.

4k Denoising +3

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

2 code implementations27 Mar 2024 Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Image Classification Image Comprehension +4

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

1 code implementation CVPR 2024 Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia

Without tuning on LLaVA-v1. 5, our method secured 70. 7 in the MMBench test and 1552. 5 in MME-perception.

MME Text Generation

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

no code implementations1 Jun 2023 Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong

Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules.

Image Generation Video Generation

Real-World Image Variation by Aligning Diffusion Inversion Chain

2 code implementations NeurIPS 2023 Yuechen Zhang, Jinbo Xing, Eric Lo, Jiaya Jia

Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.

Image-Variation Semantic Similarity +3

Video-P2P: Video Editing with Cross-attention Control

1 code implementation CVPR 2024 Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

This paper presents Video-P2P, a novel framework for real-world video editing with cross-attention control.

Image Generation Video Editing +1

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

1 code implementation CVPR 2023 Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong

In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty.

3D Face Animation regression

PCL: Proxy-Based Contrastive Learning for Domain Generalization

1 code implementation CVPR 2022 Xufeng Yao, Yang Bai, Xinyun Zhang, Yuechen Zhang, Qi Sun, Ran Chen, Ruiyu Li, Bei Yu

Domain generalization refers to the problem of training a model from a collection of different source domains that can directly generalize to the unseen target domains.

Contrastive Learning Domain Generalization

Cannot find the paper you are looking for? You can Submit a new open access paper.