Search Results for author: Menghan Xia

Found 26 papers, 19 papers with code

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

1 code implementation • 16 Feb 2024 • Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, YuFei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen

Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data.

Video Generation

Paper
Code

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

2 code implementations • 17 Jan 2024 • Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, Ying Shan

Based on this stronger coupling, we shift the distribution to higher quality without motion degradation by finetuning spatial modules with high-quality images, resulting in a generic high-quality video model.

Ranked #1 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation

4,039

Paper
Code

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

1 code implementation • 6 Dec 2023 • Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan

Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion.

Object Video Generation

1,036

Paper
Code

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

2 code implementations • 1 Dec 2023 • Gongye Liu, Menghan Xia, Yong Zhang, Haoxin Chen, Jinbo Xing, Xintao Wang, Yujiu Yang, Ying Shan

To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image.

Disentanglement Text-to-Video Generation +1

156

Paper
Code

Sketch Video Synthesis

1 code implementation • 26 Nov 2023 • Yudian Zheng, Xiaodong Cun, Menghan Xia, Chi-Man Pun

Understanding semantic intricacies and high-level concepts is essential in image sketch generation, and this challenge becomes even more formidable when applied to the domain of videos.

Video Editing

175

Paper
Code

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

3 code implementations • 30 Oct 2023 • Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan

The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.

Ranked #3 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation

4,039

Paper
Code

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

3 code implementations • 23 Oct 2023 • Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu

With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress.

Video Generation

312

Paper
Code

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

1 code implementation • 18 Oct 2023 • Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Xintao Wang, Tien-Tsin Wong, Ying Shan

Animating a still image offers an engaging visual experience.

Image Animation

1,562

Paper
Code

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

1 code implementation • 11 Oct 2023 • Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan

Our work also suggests that a pre-trained diffusion model trained on low-resolution images can be directly used for high-resolution visual generation without further tuning, which may provide insights for future research on ultra-high-resolution image and video synthesis.

Image Generation

431

Paper
Code

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

1 code implementation • 13 Jul 2023 • Yingqing He, Menghan Xia, Haoxin Chen, Xiaodong Cun, Yuan Gong, Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng Chen

For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.

Retrieval Video Generation +2

232

Paper
Code

Taming Reversible Halftoning via Predictive Luminance

no code implementations • 14 Jun 2023 • Cheuk-Kit Lau, Menghan Xia, Tien-Tsin Wong

Furthermore, to tackle the conflicts between the blue-noise quality and restoration accuracy in our novel base method, we proposed a predictor-embedded approach to offload predictable information from the network, which in our case is the luminance information resembling from the halftone pattern.

Paper
Add Code

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

no code implementations • 1 Jun 2023 • Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong

Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules.

Image Generation Video Generation

Paper
Add Code

TaleCrafter: Interactive Story Visualization with Multiple Characters

1 code implementation • 29 May 2023 • Yuan Gong, Youxin Pang, Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.

Story Visualization Text-to-Image Generation

238

Paper
Code

CoordFill: Efficient High-Resolution Image Inpainting via Parameterized Coordinate Querying

1 code implementation • 15 Mar 2023 • Weihuang Liu, Xiaodong Cun, Chi-Man Pun, Menghan Xia, Yong Zhang, Jue Wang

Thanks to the proposed structure, we only encode the high-resolution image in a relatively low resolution for larger reception field capturing.

Image Inpainting Vocal Bursts Intensity Prediction

Paper
Code

CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior

1 code implementation • CVPR 2023 • Jinbo Xing, Menghan Xia, Yuechen Zhang, Xiaodong Cun, Jue Wang, Tien-Tsin Wong

In this paper, we propose to cast speech-driven facial animation as a code query task in a finite proxy space of the learned codebook, which effectively promotes the vividness of the generated motions by reducing the cross-modal mapping uncertainty.

Ranked #4 on 3D Face Animation on BEAT2

3D Face Animation regression

456

Paper
Code

Disentangled Image Colorization via Global Anchors

1 code implementation • SIGGRAPH 2022 • Menghan Xia, WenBo Hu, Tien-Tsin Wong, Jue Wang

Our key insight is that several carefully located anchors could approximately represent the color distribution of an image, and conditioned on the anchor colors, we can predict the image color in a deterministic manner by utilizing internal correlation.

Colorization Image Colorization

109

Paper
Code

VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild

1 code implementation • 27 Nov 2022 • Kun Cheng, Xiaodong Cun, Yong Zhang, Menghan Xia, Fei Yin, Mingrui Zhu, Xuan Wang, Jue Wang, Nannan Wang

Our system disentangles this objective into three sequential tasks: (1) face video generation with a canonical expression; (2) audio-driven lip-sync; and (3) face enhancement for improving photo-realism.

Video Editing Video Generation

5,542

Paper
Code

PalGAN: Image Colorization with Palette Generative Adversarial Networks

1 code implementation • 20 Oct 2022 • Yi Wang, Menghan Xia, Lu Qi, Jing Shao, Yu Qiao

Multimodal ambiguity and color bleeding remain challenging in colorization.

Colorization Image Colorization

Paper
Code

Screentone-Preserved Manga Retargeting

no code implementations • 7 Mar 2022 • Minshan Xie, Menghan Xia, Xueting Liu, Tien-Tsin Wong

Fortunately, the rescaled manga shares the same region-wise screentone correspondences with the original manga, which enables us to simplify the screentone synthesis problem as an anchor-based proposals selection and rearrangement problem.

Translation

Paper
Add Code

Invertible Tone Mapping with Selectable Styles

no code implementations • 9 Oct 2021 • Zhuming Zhang, Menghan Xia, Xueting Liu, Chengze Li, Tien-Tsin Wong

In this paper, we propose an invertible tone mapping method that converts the multi-exposure HDR to a true LDR (8-bit per color channel) and reserves the capability to accurately restore the original HDR from this {\em invertible LDR}.

Tone Mapping

Paper
Add Code

Exploiting Aliasing for Manga Restoration

1 code implementation • CVPR 2021 • Minshan Xie, Menghan Xia, Tien-Tsin Wong

First, we predict the target resolution from the degraded manga via the Scale Estimation Network (SE-Net) with spatial voting scheme.

152

Paper
Code

A Learned Compact and Editable Light Field Representation

no code implementations • 21 Mar 2021 • Menghan Xia, Jose Echevarria, Minshan Xie, Tien-Tsin Wong

Light fields are 4D scene representation typically structured as arrays of views, or several directional samples per pixel in a single view.

Paper
Add Code

Deep Halftoning With Reversible Binary Pattern

1 code implementation • ICCV 2021 • Menghan Xia, WenBo Hu, Xueting Liu, Tien-Tsin Wong

Existing halftoning algorithms usually drop colors and fine details when dithering color images with binary dot patterns, which makes it extremely difficult to recover the original information.

Paper
Code

Enhance Convolutional Neural Networks with Noise Incentive Block

no code implementations • 9 Dec 2020 • Menghan Xia, Yi Wang, Chu Han, Tien-Tsin Wong

Noise Incentive Block (NIB), which serves as a generic plug-in for any CNN generation model.

Image Generation Translation

Paper
Add Code

Mononizing Binocular Videos

1 code implementation • 3 Sep 2020 • Wenbo Hu, Menghan Xia, Chi-Wing Fu, Tien-Tsin Wong

This paper presents the idea ofmono-nizingbinocular videos and a frame-work to effectively realize it.

Image and Video Processing Graphics

Paper
Code

Line-Based Multi-Label Energy Optimization for Fisheye Image Rectification and Calibration

no code implementations • CVPR 2015 • Mi Zhang, Jian Yao, Menghan Xia, Kai Li, Yi Zhang, Yaping Liu

Fisheye image rectification and estimation of intrinsic parameters for real scenes have been addressed in the literature by using line information on the distorted images.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.