Search Results for author: YuChao Gu

Found 18 papers, 10 papers with code

Long-Context Autoregressive Video Modeling with Next-Frame Prediction

1 code implementation25 Mar 2025 YuChao Gu, Weijia Mao, Mike Zheng Shou

Long-context autoregressive modeling has significantly advanced language generation, but video generation still struggles to fully utilize extended temporal contexts.

Text Generation Video Generation

Edit Transfer: Learning Image Editing via Vision In-Context Relations

no code implementations17 Mar 2025 Lan Chen, Qi Mao, YuChao Gu, Mike Zheng Shou

We introduce a new setting, Edit Transfer, where a model learns a transformation from just a single source-target example and applies it to a new query image.

In-Context Learning Relation +1

ROICtrl: Boosting Instance Control for Visual Generation

no code implementations27 Nov 2024 YuChao Gu, Yipin Zhou, Yunfan Ye, Yixin Nie, Licheng Yu, Pingchuan Ma, Kevin Qinghong Lin, Mike Zheng Shou

Natural language often struggles to accurately associate positional and attribute information with multiple instances, which limits current text-based visual generation models to simpler compositions featuring only a few dominant instances.

Attribute object-detection +1

EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models

1 code implementation9 Oct 2024 Rui Zhao, Hangjie Yuan, Yujie Wei, Shiwei Zhang, YuChao Gu, Lingmin Ran, Xiang Wang, Zhangjie Wu, Junhao Zhang, Yingya Zhang, Mike Zheng Shou

Our experiments with extensive data indicate that the model trained on generated data of the advanced model can approximate its generation capability.

Text-to-Image Generation

DragAnything: Motion Control for Anything using Entity Representation

2 code implementations12 Mar 2024 Weijia Wu, Zhuang Li, YuChao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation.

Object Video Generation

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

no code implementations CVPR 2024 Haoyu Ma, Shahin Mahdizadehaghdam, Bichen Wu, Zhipeng Fan, YuChao Gu, Wenliang Zhao, Lior Shapira, Xiaohui Xie

Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control.

Video Editing

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

no code implementations18 Dec 2023 Qi Mao, Lan Chen, YuChao Gu, Zhen Fang, Mike Zheng Shou

Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions.

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

no code implementations CVPR 2024 YuChao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, Kevin Tang

In contrast to previous methods that rely on dense correspondences, we introduce the VideoSwap framework that exploits semantic point correspondences, inspired by our observation that only a small number of semantic points are necessary to align the subject's motion trajectory and modify its shape.

Video Editing

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

no code implementations CVPR 2024 Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, YuChao Gu, Rui Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou

To overcome this, we propose to introduce the dynamic Neural Radiance Fields (NeRF) as the innovative video representation, where the editing can be performed in the 3D spaces and propagated to the entire video via the deformation field.

NeRF Style Transfer +2

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

1 code implementation12 Oct 2023 Rui Zhao, YuChao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou

Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion.

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

1 code implementation27 Sep 2023 David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, YuChao Gu, Difei Gao, Mike Zheng Shou

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

Text-to-Video Generation Video Alignment +1

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation NeurIPS 2023 Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Dataset Generation Decoder +7

VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder

1 code implementation13 May 2022 YuChao Gu, Xintao Wang, Liangbin Xie, Chao Dong, Gen Li, Ying Shan, Ming-Ming Cheng

Equipped with the VQ codebook as a facial detail dictionary and the parallel decoder design, the proposed VQFR can largely enhance the restored quality of facial details while keeping the fidelity to previous methods.

Blind Face Restoration Decoder +1

Cannot find the paper you are looking for? You can Submit a new open access paper.