Search Results for author: Yuechen Zhang

Found 9 papers, 8 papers with code

Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models

2 code implementations27 Mar 2024 Yanwei Li, Yuechen Zhang, Chengyao Wang, Zhisheng Zhong, Yixin Chen, Ruihang Chu, Shaoteng Liu, Jiaya Jia

We try to narrow the gap by mining the potential of VLMs for better performance and any-to-any workflow from three aspects, i. e., high-resolution visual tokens, high-quality data, and VLM-guided generation.

Image Comprehension Visual Dialog +1

Prompt Highlighter: Interactive Control for Multi-Modal LLMs

1 code implementation7 Dec 2023 Yuechen Zhang, Shengju Qian, Bohao Peng, Shu Liu, Jiaya Jia

Without tuning on LLaVA-v1. 5, our method secured 70. 7 in the MMBench test and 1552. 5 in MME-perception.

Text Generation

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

no code implementations1 Jun 2023 Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong

Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules.

Image Generation Video Generation

Real-World Image Variation by Aligning Diffusion Inversion Chain

2 code implementations NeurIPS 2023 Yuechen Zhang, Jinbo Xing, Eric Lo, Jiaya Jia

Our pipeline enhances the generation quality of image variations by aligning the image generation process to the source image's inversion chain.

Image-Variation Semantic Similarity +2

Video-P2P: Video Editing with Cross-attention Control

1 code implementation8 Mar 2023 Shaoteng Liu, Yuechen Zhang, Wenbo Li, Zhe Lin, Jiaya Jia

This paper presents Video-P2P, a novel framework for real-world video editing with cross-attention control.

Image Generation Video Editing +1

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

1 code implementation CVPR 2023 Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong

In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty.

3D Face Animation regression

PCL: Proxy-Based Contrastive Learning for Domain Generalization

1 code implementation CVPR 2022 Xufeng Yao, Yang Bai, Xinyun Zhang, Yuechen Zhang, Qi Sun, Ran Chen, Ruiyu Li, Bei Yu

Domain generalization refers to the problem of training a model from a collection of different source domains that can directly generalize to the unseen target domains.

Contrastive Learning Domain Generalization

Cannot find the paper you are looking for? You can Submit a new open access paper.