Search Results for author: YuChao Gu

Found 13 papers, 7 papers with code

DragAnything: Motion Control for Anything using Entity Representation

1 code implementation12 Mar 2024 Weijia Wu, Zhuang Li, YuChao Gu, Rui Zhao, Yefei He, David Junhao Zhang, Mike Zheng Shou, Yan Li, Tingting Gao, Di Zhang

We introduce DragAnything, which utilizes a entity representation to achieve motion control for any object in controllable video generation.

Object Video Generation

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

no code implementations CVPR 2024 Haoyu Ma, Shahin Mahdizadehaghdam, Bichen Wu, Zhipeng Fan, YuChao Gu, Wenliang Zhao, Lior Shapira, Xiaohui Xie

Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control.

Video Editing

MAG-Edit: Localized Image Editing in Complex Scenarios via Mask-Based Attention-Adjusted Guidance

no code implementations18 Dec 2023 Qi Mao, Lan Chen, YuChao Gu, Zhen Fang, Mike Zheng Shou

Recent diffusion-based image editing approaches have exhibited impressive editing capabilities in images with simple compositions.

VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence

no code implementations CVPR 2024 YuChao Gu, Yipin Zhou, Bichen Wu, Licheng Yu, Jia-Wei Liu, Rui Zhao, Jay Zhangjie Wu, David Junhao Zhang, Mike Zheng Shou, Kevin Tang

In contrast to previous methods that rely on dense correspondences, we introduce the VideoSwap framework that exploits semantic point correspondences, inspired by our observation that only a small number of semantic points are necessary to align the subject's motion trajectory and modify its shape.

Video Editing

DynVideo-E: Harnessing Dynamic NeRF for Large-Scale Motion- and View-Change Human-Centric Video Editing

no code implementations CVPR 2024 Jia-Wei Liu, Yan-Pei Cao, Jay Zhangjie Wu, Weijia Mao, YuChao Gu, Rui Zhao, Jussi Keppo, Ying Shan, Mike Zheng Shou

To overcome this, we propose to introduce the dynamic Neural Radiance Fields (NeRF) as the innovative video representation, where the editing can be performed in the 3D spaces and propagated to the entire video via the deformation field.

Style Transfer Super-Resolution +1

MotionDirector: Motion Customization of Text-to-Video Diffusion Models

1 code implementation12 Oct 2023 Rui Zhao, YuChao Gu, Jay Zhangjie Wu, David Junhao Zhang, Jiawei Liu, Weijia Wu, Jussi Keppo, Mike Zheng Shou

Given a set of video clips of the same motion concept, the task of Motion Customization is to adapt existing text-to-video diffusion models to generate videos with this motion.

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

1 code implementation27 Sep 2023 David Junhao Zhang, Jay Zhangjie Wu, Jia-Wei Liu, Rui Zhao, Lingmin Ran, YuChao Gu, Difei Gao, Mike Zheng Shou

In this paper, we are the first to propose a hybrid model, dubbed as Show-1, which marries pixel-based and latent-based VDMs for text-to-video generation.

Text-to-Video Generation Video Alignment +1

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

1 code implementation NeurIPS 2023 Weijia Wu, Yuzhong Zhao, Hao Chen, YuChao Gu, Rui Zhao, Yefei He, Hong Zhou, Mike Zheng Shou, Chunhua Shen

To showcase the power of the proposed approach, we generate datasets with rich dense pixel-wise labels for a wide range of downstream tasks, including semantic segmentation, instance segmentation, and depth estimation.

Decoder Depth Estimation +6

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

no code implementations CVPR 2024 YuChao Gu, Xintao Wang, Yixiao Ge, Ying Shan, XiaoHu Qie, Mike Zheng Shou

Vector-Quantized (VQ-based) generative models usually consist of two basic components, i. e., VQ tokenizers and generative transformers.

Conditional Image Generation Decoder

VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder

1 code implementation13 May 2022 YuChao Gu, Xintao Wang, Liangbin Xie, Chao Dong, Gen Li, Ying Shan, Ming-Ming Cheng

Equipped with the VQ codebook as a facial detail dictionary and the parallel decoder design, the proposed VQFR can largely enhance the restored quality of facial details while keeping the fidelity to previous methods.

Blind Face Restoration Decoder +1

Cannot find the paper you are looking for? You can Submit a new open access paper.