Search Results for author: Yingqing He

Found 23 papers, 15 papers with code

Fake it till You Make it: Reward Modeling as Discriminative Prediction

no code implementations16 Jun 2025 Runtao Liu, Jiahao Zhan, Yingqing He, Chen Wei, Alan Yuille, Qifeng Chen

An effective reward model plays a pivotal role in reinforcement learning for post-training enhancement of visual generative models.

Large Motion Video Autoencoding with Cross-modal Video VAE

no code implementations23 Dec 2024 Yazhou Xing, Yang Fei, Yingqing He, Jingye Chen, Jiaxin Xie, Xiaowei Chi, Qifeng Chen

Directly applying image VAEs to individual frames in isolation can result in temporal inconsistencies and suboptimal compression rates due to a lack of temporal compression.

Video Generation

VideoDPO: Omni-Preference Alignment for Video Diffusion Generation

no code implementations CVPR 2025 Runtao Liu, HaoYu Wu, Zheng Ziqiang, Chen Wei, Yingqing He, Renjie Pi, Qifeng Chen

Unlike previous image alignment methods that focus solely on either (i) visual quality or (ii) semantic alignment between text and videos, we comprehensively consider both dimensions and construct a preference score accordingly, which we term the OmniScore.

Image Generation Text-to-Video Generation +1

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

1 code implementation30 Jul 2024 Xiaowei Chi, Yatian Wang, Aosong Cheng, Pengjun Fang, Zeyue Tian, Yingqing He, Zhaoyang Liu, Xingqun Qi, Jiahao Pan, Rongyu Zhang, Mengfei Li, Ruibin Yuan, Yanbing Jiang, Wei Xue, Wenhan Luo, Qifeng Chen, Shanghang Zhang, Qifeng Liu, Yike Guo

To fulfill this gap, we present MMTrail, a large-scale multi-modality video-language dataset incorporating more than 20M trailer clips with visual captions, and 2M high-quality clips with multimodal captions.

Audio Generation Image to Video Generation +2

FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models

1 code implementation24 Jun 2024 Haonan Qiu, Zhaoxi Chen, Zhouxia Wang, Yingqing He, Menghan Xia, Ziwei Liu

Diffusion model has demonstrated remarkable capability in video generation, which further sparks interest in introducing trajectory control into the generation process.

Video Generation

Follow-Your-Emoji: Fine-Controllable and Expressive Freestyle Portrait Animation

no code implementations4 Jun 2024 Yue Ma, Hongyu Liu, Hongfa Wang, Heng Pan, Yingqing He, Junkun Yuan, Ailing Zeng, Chengfei Cai, Heung-Yeung Shum, Wei Liu, Qifeng Chen

We present Follow-Your-Emoji, a diffusion-based framework for portrait animation, which animates a reference portrait with target landmark sequences.

Portrait Animation

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

no code implementations CVPR 2024 Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen

Thus, instead of training the giant models from scratch, we propose to bridge the existing strong models with a shared latent representation space.

Audio Generation Denoising

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

1 code implementation16 Feb 2024 Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, YuFei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen

Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data.

Video Generation

MagicStick: Controllable Video Editing via Control Handle Transformations

1 code implementation5 Dec 2023 Yue Ma, Xiaodong Cun, Sen Liang, Jinbo Xing, Yingqing He, Chenyang Qi, Siran Chen, Qifeng Chen

Yet succinct, our method is the first method to show the ability of video property editing from the pre-trained text-to-image model.

Video Editing Video Generation

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

3 code implementations30 Oct 2023 Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan

The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.

Text-to-Video Generation Video Generation

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

3 code implementations23 Oct 2023 Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu

With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress.

Video Generation

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

1 code implementation11 Oct 2023 Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan

Our work also suggests that a pre-trained diffusion model trained on low-resolution images can be directly used for high-resolution visual generation without further tuning, which may provide insights for future research on ultra-high-resolution image and video synthesis.

Image Generation

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

no code implementations1 Jun 2023 Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong

Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules.

Image Generation Video Generation

TaleCrafter: Interactive Story Visualization with Multiple Characters

1 code implementation29 May 2023 Yuan Gong, Youxin Pang, Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.

Layout Generation Story Visualization +2

Latent Video Diffusion Models for High-Fidelity Long Video Generation

1 code implementation23 Nov 2022 Yingqing He, Tianyu Yang, Yong Zhang, Ying Shan, Qifeng Chen

Diffusion models have shown remarkable results recently but require significant computational resources.

Denoising Image Generation +3

Interpreting Class Conditional GANs with Channel Awareness

no code implementations21 Mar 2022 Yingqing He, Zhiyi Zhang, Jiapeng Zhu, Yujun Shen, Qifeng Chen

To describe such a phenomenon, we propose channel awareness, which quantitatively characterizes how a single channel contributes to the final synthesis.

Unsupervised Portrait Shadow Removal via Generative Priors

1 code implementation7 Aug 2021 Yingqing He, Yazhou Xing, Tianjia Zhang, Qifeng Chen

Qualitative and quantitative experiments on a real-world portrait shadow dataset demonstrate that our approach achieves comparable performance with supervised shadow removal methods.

Shadow Removal Unsupervised Semantic Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.