1 code implementation • 22 May 2025 • Yuechen Zhang, Jinbo Xing, Bin Xia, Shaoteng Liu, Bohao Peng, Xin Tao, Pengfei Wan, Eric Lo, Jiaya Jia
Despite the remarkable generation quality of video Diffusion Transformer (DiT) models, their practical deployment is severely hindered by extensive computational requirements.
1 code implementation • 7 Jan 2025 • Yuechen Zhang, Yaoyang Liu, Bin Xia, Bohao Peng, Zexin Yan, Eric Lo, Jiaya Jia
We present Magic Mirror, a framework for generating identity-preserved videos with cinematic-level quality and dynamic motion.
no code implementations • CVPR 2025 • Bin Xia, Yuechen Zhang, Jingyao Li, Chengyao Wang, Yitong Wang, Xinglong Wu, Bei Yu, Jiaya Jia
We begin by analyzing existing frameworks and the requirements of downstream tasks, proposing a unified framework that integrates both T2I models and various editing tasks.
1 code implementation • 12 Dec 2024 • Zhisheng Zhong, Chengyao Wang, Yuqi Liu, Senqiao Yang, Longxiang Tang, Yuechen Zhang, Jingyao Li, Tianyuan Qu, Yanwei Li, Yukang Chen, Shaozuo Yu, Sitong Wu, Eric Lo, Shu Liu, Jiaya Jia
As Multi-modal Large Language Models (MLLMs) evolve, expanding beyond single-domain capabilities is essential to meet the demands for more versatile and efficient AI.
Ranked #1 on
Visual Question Answering (VQA)
on EgoSchema
1 code implementation • 12 Aug 2024 • Bohao Peng, Jian Wang, Yuechen Zhang, Wenbo Li, Ming-Chang Yang, Jiaya Jia
In this paper, we propose ControlNeXt: a powerful and efficient method for controllable image and video generation.
no code implementations • 24 Jun 2024 • Shuwei Shi, Wenbo Li, Yuechen Zhang, Jingwen He, Biao Gong, Yinqiang Zheng
Diffusion models excel at producing high-quality images; however, scaling to higher resolutions, such as 4K, often results in over-smoothed content, structural distortions, and repetitive patterns.
2 code implementations • 27 Mar 2024 • Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia
We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.
1 code implementation • CVPR 2024 • Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia
Without tuning on LLaVA-v1. 5, our method secured 70. 7 in the MMBench test and 1552. 5 in MME-perception.
no code implementations • 1 Jun 2023 • Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong
Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules.
2 code implementations • NeurIPS 2023 • Yuechen Zhang, Jinbo Xing, Eric Lo, Jiaya Jia
Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.
1 code implementation • CVPR 2024 • Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia
This paper presents Video-P2P, a novel framework for real-world video editing with cross-attention control.
1 code implementation • CVPR 2023 • Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong
In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty.
Ranked #4 on
3D Face Animation
on BEAT2
1 code implementation • CVPR 2023 • Yuechen Zhang, Zexin He, Jinbo Xing, Xufeng Yao, Jiaya Jia
We propose a ray registration process based on the stylized reference view to obtain pseudo-ray supervision in novel views.
1 code implementation • CVPR 2022 • Xufeng Yao, Yang Bai, Xinyun Zhang, Yuechen Zhang, Qi Sun, Ran Chen, Ruiyu Li, Bei Yu
Domain generalization refers to the problem of training a model from a collection of different source domains that can directly generalize to the unseen target domains.
Ranked #23 on
Domain Generalization
on PACS
1 code implementation • CVPR 2022 • Tiancheng Shen, Yuechen Zhang, Lu Qi, Jason Kuen, Xingyu Xie, Jianlong Wu, Zhe Lin, Jiaya Jia
To segment 4K or 6K ultra high-resolution images needs extra computation consideration in image segmentation.