Search Results for author: Xintao Wang

Found 84 papers, 58 papers with code

Evaluating Character Understanding of Large Language Models via Character Profiling from Fictional Works

1 code implementation • 19 Apr 2024 • Xinfeng Yuan, Siyu Yuan, Yuhan Cui, Tianhe Lin, Xintao Wang, Rui Xu, Jiangjie Chen, Deqing Yang

These results strongly validate the character understanding capability of LLMs.

Paper
Code

Character is Destiny: Can Large Language Models Simulate Persona-Driven Decisions in Role-Playing?

no code implementations • 18 Apr 2024 • Rui Xu, Xintao Wang, Jiangjie Chen, Siyu Yuan, Xinfeng Yuan, Jiaqing Liang, Zulong Chen, Xiaoqing Dong, Yanghua Xiao

Can Large Language Models substitute humans in making important decisions?

Decision Making Retrieval

Paper
Add Code

InstantMesh: Efficient 3D Mesh Generation from a Single Image with Sparse-view Large Reconstruction Models

2 code implementations • 10 Apr 2024 • Jiale Xu, Weihao Cheng, Yiming Gao, Xintao Wang, Shenghua Gao, Ying Shan

We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a single image, featuring state-of-the-art generation quality and significant training scalability.

Image to 3D

1,226

Paper
Code

SurveyAgent: A Conversational System for Personalized and Efficient Research Survey

no code implementations • 9 Apr 2024 • Xintao Wang, Jiangjie Chen, Nianqi Li, Lida Chen, Xinfeng Yuan, Wei Shi, Xuyang Ge, Rui Xu, Yanghua Xiao

In the rapidly advancing research fields such as AI, managing and staying abreast of the latest scientific literature has become a significant challenge for researchers.

Management Question Answering

Paper
Add Code

SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model

no code implementations • 15 Mar 2024 • Tao Wu, XueWei Li, Zhongang Qi, Di Hu, Xintao Wang, Ying Shan, Xi Li

Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains. However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation. In this paper, we introduce a novel framework of SphereDiffusion to address these unique challenges, for better generating high-quality and precisely controllable spherical panoramic images. For the spherical distortion characteristic, we embed the semantics of the distorted object with text encoding, then explicitly construct the relationship with text-object correspondence to better use the pre-trained knowledge of the planar images. Meanwhile, we employ a deformable technique to mitigate the semantic deviation in latent space caused by spherical distortion. For the spherical geometry characteristic, in virtue of spherical rotation invariance, we improve the data diversity and optimization objectives in the training process, enabling the model to better learn the spherical geometry characteristic. Furthermore, we enhance the denoising process of the diffusion model, enabling it to effectively use the learned geometric characteristic to ensure the boundary continuity of the generated images. With these specific techniques, experiments on Structured3D dataset show that SphereDiffusion significantly improves the quality of controllable spherical image generation and relatively reduces around 35% FID on average.

Denoising Image Generation

Paper
Add Code

BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion

2 code implementations • 11 Mar 2024 • Xuan Ju, Xian Liu, Xintao Wang, Yuxuan Bian, Ying Shan, Qiang Xu

Image inpainting, the process of restoring corrupted images, has seen significant advancements with the advent of diffusion models (DMs).

Image Inpainting

846

Paper
Code

Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners

no code implementations • 27 Feb 2024 • Yazhou Xing, Yingqing He, Zeyue Tian, Xintao Wang, Qifeng Chen

Thus, instead of training the giant models from scratch, we propose to bridge the existing strong models with a shared latent representation space.

Audio Generation Denoising

Paper
Add Code

Make a Cheap Scaling: A Self-Cascade Diffusion Model for Higher-Resolution Adaptation

1 code implementation • 16 Feb 2024 • Lanqing Guo, Yingqing He, Haoxin Chen, Menghan Xia, Xiaodong Cun, YuFei Wang, Siyu Huang, Yong Zhang, Xintao Wang, Qifeng Chen, Ying Shan, Bihan Wen

Diffusion models have proven to be highly effective in image and video generation; however, they still face composition challenges when generating images of varying sizes due to single-scale training data.

Video Generation

Paper
Code

DiffEditor: Boosting Accuracy and Flexibility on Diffusion-based Image Editing

1 code implementation • 4 Feb 2024 • Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang

Large-scale Text-to-Image (T2I) diffusion models have revolutionized image generation over the last few years.

Image Generation

644

Paper
Code

Scaling Up to Excellence: Practicing Model Scaling for Photo-Realistic Image Restoration In the Wild

no code implementations • 24 Jan 2024 • Fanghua Yu, Jinjin Gu, Zheyuan Li, JinFan Hu, Xiangtao Kong, Xintao Wang, Jingwen He, Yu Qiao, Chao Dong

We introduce SUPIR (Scaling-UP Image Restoration), a groundbreaking image restoration method that harnesses generative prior and the power of model scaling up.

Descriptive Image Restoration

Paper
Add Code

VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models

2 code implementations • 17 Jan 2024 • Haoxin Chen, Yong Zhang, Xiaodong Cun, Menghan Xia, Xintao Wang, Chao Weng, Ying Shan

Based on this stronger coupling, we shift the distribution to higher quality without motion degradation by finetuning spatial modules with high-quality images, resulting in a generic high-quality video model.

Ranked #1 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation

4,061

Paper
Code

Towards A Better Metric for Text-to-Video Generation

no code implementations • 15 Jan 2024 • Jay Zhangjie Wu, Guian Fang, HaoNing Wu, Xintao Wang, Yixiao Ge, Xiaodong Cun, David Junhao Zhang, Jia-Wei Liu, YuChao Gu, Rui Zhao, Weisi Lin, Wynne Hsu, Ying Shan, Mike Zheng Shou

Experiments on the TVGE dataset demonstrate the superiority of the proposed T2VScore on offering a better metric for text-to-video generation.

Text-to-Video Generation Video Alignment +1

Paper
Add Code

ConcEPT: Concept-Enhanced Pre-Training for Language Models

no code implementations • 11 Jan 2024 • Xintao Wang, Zhouhong Gu, Jiaqing Liang, Dakuan Lu, Yanghua Xiao, Wei Wang

In this paper, we propose ConcEPT, which stands for Concept-Enhanced Pre-Training for language models, to infuse conceptual knowledge into PLMs.

Entity Linking Entity Typing

Paper
Add Code

SmartEdit: Exploring Complex Instruction-based Image Editing with Multimodal Large Language Models

1 code implementation • 11 Dec 2023 • Yuzhou Huang, Liangbin Xie, Xintao Wang, Ziyang Yuan, Xiaodong Cun, Yixiao Ge, Jiantao Zhou, Chao Dong, Rui Huang, Ruimao Zhang, Ying Shan

Both quantitative and qualitative results on this evaluation dataset indicate that our SmartEdit surpasses previous methods, paving the way for the practical application of complex instruction-based image editing.

125

Paper
Code

PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

1 code implementation • 7 Dec 2023 • Zhen Li, Mingdeng Cao, Xintao Wang, Zhongang Qi, Ming-Ming Cheng, Ying Shan

Recent advances in text-to-image generation have made remarkable progress in synthesizing realistic human photos conditioned on given text prompts.

Ranked #6 on Diffusion Personalization Tuning Free on AgeDB

Diffusion Personalization Tuning Free Text-to-Image Generation

8,230

Paper
Code

MotionCtrl: A Unified and Flexible Motion Controller for Video Generation

1 code implementation • 6 Dec 2023 • Zhouxia Wang, Ziyang Yuan, Xintao Wang, Tianshui Chen, Menghan Xia, Ping Luo, Ying Shan

Therefore, this paper presents MotionCtrl, a unified and flexible motion controller for video generation designed to effectively and independently control camera and object motion.

Object Video Generation

1,047

Paper
Code

AnimateZero: Video Diffusion Models are Zero-Shot Image Animators

1 code implementation • 6 Dec 2023 • Jiwen Yu, Xiaodong Cun, Chenyang Qi, Yong Zhang, Xintao Wang, Ying Shan, Jian Zhang

For appearance control, we borrow intermediate latents and their features from the text-to-image (T2I) generation for ensuring the generated first frame is equal to the given generated image.

Image Animation Video Generation

345

Paper
Code

MagicStick: Controllable Video Editing via Control Handle Transformations

1 code implementation • 5 Dec 2023 • Yue Ma, Xiaodong Cun, Yingqing He, Chenyang Qi, Xintao Wang, Ying Shan, Xiu Li, Qifeng Chen

Yet succinct, our method is the first method to show the ability of video property editing from the pre-trained text-to-image model.

Video Editing Video Generation

Paper
Code

X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model

no code implementations • 4 Dec 2023 • Lingmin Ran, Xiaodong Cun, Jia-Wei Liu, Rui Zhao, Song Zijie, Xintao Wang, Jussi Keppo, Mike Zheng Shou

To enhance the guidance ability of X-Adapter, we employ a null-text training strategy for the upgraded model.

Denoising

Paper
Add Code

StyleCrafter: Enhancing Stylized Text-to-Video Generation with Style Adapter

2 code implementations • 1 Dec 2023 • Gongye Liu, Menghan Xia, Yong Zhang, Haoxin Chen, Jinbo Xing, Xintao Wang, Yujiu Yang, Ying Shan

To address these challenges, we introduce StyleCrafter, a generic method that enhances pre-trained T2V models with a style control adapter, enabling video generation in any style by providing a reference image.

Disentanglement Text-to-Video Generation +1

157

Paper
Code

Source Prompt: Coordinated Pre-training of Language Models on Diverse Corpora from Multiple Sources

no code implementations • 16 Nov 2023 • Yipei Xu, Dakuan Lu, Jiaqing Liang, Xintao Wang, Yipeng Geng, Yingsi Xin, Hengkui Wu, Ken Chen, ruiji zhang, Yanghua Xiao

Pre-trained language models (PLMs) have established the new paradigm in the field of NLP.

Paper
Add Code

VideoCrafter1: Open Diffusion Models for High-Quality Video Generation

3 code implementations • 30 Oct 2023 • Haoxin Chen, Menghan Xia, Yingqing He, Yong Zhang, Xiaodong Cun, Shaoshu Yang, Jinbo Xing, Yaofang Liu, Qifeng Chen, Xintao Wang, Chao Weng, Ying Shan

The I2V model is designed to produce videos that strictly adhere to the content of the provided reference image, preserving its content, structure, and style.

Ranked #3 on Text-to-Video Generation on EvalCrafter Text-to-Video (ECTV) Dataset (using extra training data)

Text-to-Video Generation Video Generation

4,061

Paper
Code

CustomNet: Zero-shot Object Customization with Variable-Viewpoints in Text-to-Image Diffusion Models

no code implementations • 30 Oct 2023 • Ziyang Yuan, Mingdeng Cao, Xintao Wang, Zhongang Qi, Chun Yuan, Ying Shan

As a result, our CustomNet ensures enhanced identity preservation and generates diverse, harmonious outputs.

Novel View Synthesis Object +1

Paper
Add Code

InCharacter: Evaluating Personality Fidelity in Role-Playing Agents through Psychological Interviews

2 code implementations • 27 Oct 2023 • Xintao Wang, Yunze Xiao, Jen-tse Huang, Siyu Yuan, Rui Xu, Haoran Guo, Quan Tu, Yaying Fei, Ziang Leng, Wei Wang, Jiangjie Chen, Cheng Li, Yanghua Xiao

This paper, instead, introduces a novel perspective to evaluate the personality fidelity of RPAs with psychological scales.

1,061

Paper
Code

New Boolean satisfiability problem heuristic strategy: Minimal Positive Negative Product Strategy

no code implementations • 26 Oct 2023 • Qun Zhao, Xintao Wang, Menghui Yang

This study presents a novel heuristic algorithm called the "Minimal Positive Negative Product Strategy" to guide the CDCL algorithm in solving the Boolean satisfiability problem.

Paper
Add Code

FreeNoise: Tuning-Free Longer Video Diffusion via Noise Rescheduling

3 code implementations • 23 Oct 2023 • Haonan Qiu, Menghan Xia, Yong Zhang, Yingqing He, Xintao Wang, Ying Shan, Ziwei Liu

With the availability of large-scale video datasets and the advances of diffusion models, text-driven video generation has achieved substantial progress.

Video Generation

316

Paper
Code

DynamiCrafter: Animating Open-domain Images with Video Diffusion Priors

1 code implementation • 18 Oct 2023 • Jinbo Xing, Menghan Xia, Yong Zhang, Haoxin Chen, Wangbo Yu, Hanyuan Liu, Xintao Wang, Tien-Tsin Wong, Ying Shan

Animating a still image offers an engaging visual experience.

Image Animation

1,602

Paper
Code

EvalCrafter: Benchmarking and Evaluating Large Video Generation Models

1 code implementation • 17 Oct 2023 • Yaofang Liu, Xiaodong Cun, Xuebo Liu, Xintao Wang, Yong Zhang, Haoxin Chen, Yang Liu, Tieyong Zeng, Raymond Chan, Ying Shan

For video generation, various open-sourced models and public-available services have been developed to generate high-quality videos.

Benchmarking Language Modelling +4

Paper
Code

Unifying Image Processing as Visual Prompting Question Answering

no code implementations • 16 Oct 2023 • Yihao Liu, Xiangyu Chen, Xianzheng Ma, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong

To address this issue, we propose a universal model for general image processing that covers image restoration, image enhancement, image feature extraction tasks, etc.

Image Enhancement Image Restoration +4

Paper
Add Code

ScaleCrafter: Tuning-free Higher-Resolution Visual Generation with Diffusion Models

1 code implementation • 11 Oct 2023 • Yingqing He, Shaoshu Yang, Haoxin Chen, Xiaodong Cun, Menghan Xia, Yong Zhang, Xintao Wang, Ran He, Qifeng Chen, Ying Shan

Our work also suggests that a pre-trained diffusion model trained on low-resolution images can be directly used for high-resolution visual generation without further tuning, which may provide insights for future research on ultra-high-resolution image and video synthesis.

Image Generation

434

Paper
Code

Making LLaMA SEE and Draw with SEED Tokenizer

1 code implementation • 2 Oct 2023 • Yuying Ge, Sijie Zhao, Ziyun Zeng, Yixiao Ge, Chen Li, Xintao Wang, Ying Shan

We identify two crucial design principles: (1) Image tokens should be independent of 2D physical patch positions and instead be produced with a 1D causal dependency, exhibiting intrinsic interdependence that aligns with the left-to-right autoregressive prediction mechanism in LLMs.

multimodal generation

464

Paper
Code

Can Large Language Models Understand Real-World Complex Instructions?

2 code implementations • 17 Sep 2023 • Qianyu He, Jie Zeng, Wenhao Huang, Lina Chen, Jin Xiao, Qianxi He, Xunzhe Zhou, Lida Chen, Xintao Wang, Yuncheng Huang, Haoning Ye, Zihan Li, Shisong Chen, Yikai Zhang, Zhouhong Gu, Jiaqing Liang, Yanghua Xiao

To bridge this gap, we propose CELLO, a benchmark for evaluating LLMs' ability to follow complex instructions systematically.

Paper
Code

HAT: Hybrid Attention Transformer for Image Restoration

2 code implementations • 11 Sep 2023 • Xiangyu Chen, Xintao Wang, Wenlong Zhang, Xiangtao Kong, Yu Qiao, Jiantao Zhou, Chao Dong

In the training stage, we additionally adopt a same-task pre-training strategy to further exploit the potential of the model for further improvement.

Image Compression Image Denoising +2

1,072

Paper
Code

StyleAdapter: A Single-Pass LoRA-Free Model for Stylized Image Generation

no code implementations • 4 Sep 2023 • Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, Ping Luo

StyleAdapter can generate high-quality images that match the content of the prompts and adopt the style of the references (even for unseen styles) in a single pass, which is more flexible and efficient than previous methods.

Image Generation

Paper
Add Code

KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases

no code implementations • 17 Aug 2023 • Xintao Wang, Qianwen Yang, Yongting Qiu, Jiaqing Liang, Qianyu He, Zhouhong Gu, Yanghua Xiao, Wei Wang

Large language models (LLMs) have demonstrated impressive impact in the field of natural language processing, but they still struggle with several issues regarding, such as completeness, timeliness, faithfulness and adaptability.

Retrieval World Knowledge

Paper
Add Code

GET3D--: Learning GET3D from Unconstrained Image Collections

no code implementations • 27 Jul 2023 • Fanghua Yu, Xintao Wang, Zheyuan Li, Yan-Pei Cao, Ying Shan, Chao Dong

While generative models have shown potential in creating 3D textured shapes from 2D images, their applicability in 3D industries is limited due to the lack of a well-defined camera distribution in real-world scenarios, resulting in low-quality shapes.

Paper
Add Code

Planting a SEED of Vision in Large Language Model

1 code implementation • 16 Jul 2023 • Yuying Ge, Yixiao Ge, Ziyun Zeng, Xintao Wang, Ying Shan

Research on image tokenizers has previously reached an impasse, as frameworks employing quantized visual tokens have lost prominence due to subpar performance and convergence in multimodal comprehension (compared to BLIP-2, etc.)

Language Modelling Large Language Model +1

464

Paper
Code

Animate-A-Story: Storytelling with Retrieval-Augmented Video Generation

1 code implementation • 13 Jul 2023 • Yingqing He, Menghan Xia, Haoxin Chen, Xiaodong Cun, Yuan Gong, Jinbo Xing, Yong Zhang, Xintao Wang, Chao Weng, Ying Shan, Qifeng Chen

For the first module, we leverage an off-the-shelf video retrieval system and extract video depths as motion structure.

Retrieval Video Generation +2

234

Paper
Code

DragonDiffusion: Enabling Drag-style Manipulation on Diffusion Models

1 code implementation • 5 Jul 2023 • Chong Mou, Xintao Wang, Jiechong Song, Ying Shan, Jian Zhang

Specifically, we construct classifier guidance based on the strong correspondence of intermediate features in the diffusion model.

Object

644

Paper
Code

DeSRA: Detect and Delete the Artifacts of GAN-based Real-World Super-Resolution Models

1 code implementation • 5 Jul 2023 • Liangbin Xie, Xintao Wang, Xiangyu Chen, Gen Li, Ying Shan, Jiantao Zhou, Chao Dong

After detecting the artifact regions, we develop a finetune procedure to improve GAN-based SR models with a few samples, so that they can deal with similar types of artifacts in more unseen real data.

Image Super-Resolution

113

Paper
Code

DreamDiffusion: Generating High-Quality Images from Brain EEG Signals

1 code implementation • 29 Jun 2023 • Yunpeng Bai, Xintao Wang, Yan-Pei Cao, Yixiao Ge, Chun Yuan, Ying Shan

This paper introduces DreamDiffusion, a novel method for generating high-quality images directly from brain electroencephalogram (EEG) signals, without the need to translate thoughts into text.

EEG Image Generation

393

Paper
Code

InstructP2P: Learning to Edit 3D Point Clouds with Text Instructions

no code implementations • 12 Jun 2023 • Jiale Xu, Xintao Wang, Yan-Pei Cao, Weihao Cheng, Ying Shan, Shenghua Gao

Enhancing AI systems to perform tasks following human instructions can significantly boost productivity.

Language Modelling Large Language Model

Paper
Add Code

Make-Your-Video: Customized Video Generation Using Textual and Structural Guidance

no code implementations • 1 Jun 2023 • Jinbo Xing, Menghan Xia, Yuxin Liu, Yuechen Zhang, Yong Zhang, Yingqing He, Hanyuan Liu, Haoxin Chen, Xiaodong Cun, Xintao Wang, Ying Shan, Tien-Tsin Wong

Our method, dubbed Make-Your-Video, involves joint-conditional video generation using a Latent Diffusion Model that is pre-trained for still image synthesis and then promoted for video generation with the introduction of temporal modules.

Image Generation Video Generation

Paper
Add Code

Inserting Anybody in Diffusion Models via Celeb Basis

1 code implementation • NeurIPS 2023 • Ge Yuan, Xiaodong Cun, Yong Zhang, Maomao Li, Chenyang Qi, Xintao Wang, Ying Shan, Huicheng Zheng

Empowered by the proposed celeb basis, the new identity in our customized model showcases a better concept combination ability than previous personalization methods.

240

Paper
Code

Mix-of-Show: Decentralized Low-Rank Adaptation for Multi-Concept Customization of Diffusion Models

2 code implementations • NeurIPS 2023 • YuChao Gu, Xintao Wang, Jay Zhangjie Wu, Yujun Shi, Yunpeng Chen, Zihan Fan, Wuyou Xiao, Rui Zhao, Shuning Chang, Weijia Wu, Yixiao Ge, Ying Shan, Mike Zheng Shou

Public large-scale text-to-image diffusion models, such as Stable Diffusion, have gained significant attention from the community.

Attribute

365

Paper
Code

TaleCrafter: Interactive Story Visualization with Multiple Characters

1 code implementation • 29 May 2023 • Yuan Gong, Youxin Pang, Xiaodong Cun, Menghan Xia, Yingqing He, Haoxin Chen, Longyue Wang, Yong Zhang, Xintao Wang, Ying Shan, Yujiu Yang

Accurate Story visualization requires several necessary elements, such as identity consistency across frames, the alignment between plain text and visual content, and a reasonable layout of objects in images.

Story Visualization Text-to-Image Generation

240

Paper
Code

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

3 code implementations • ICCV 2023 • Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, XiaoHu Qie, Yinqiang Zheng

Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results.

Ranked #11 on Text-based Image Editing on PIE-Bench

Text-based Image Editing

629

Paper
Code

Follow Your Pose: Pose-Guided Text-to-Video Generation using Pose-Free Videos

1 code implementation • 3 Apr 2023 • Yue Ma, Yingqing He, Xiaodong Cun, Xintao Wang, Siran Chen, Ying Shan, Xiu Li, Qifeng Chen

Generating text-editable and pose-controllable character videos have an imperious demand in creating various digital human.

Text-to-Image Generation Text-to-Video Generation +1

1,007

Paper
Code

FateZero: Fusing Attentions for Zero-shot Text-based Video Editing

1 code implementation • ICCV 2023 • Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, Qifeng Chen

We also have a better zero-shot shape-aware editing ability based on the text-to-video model.

Attribute Text-to-Video Editing +2

1,041

Paper
Code

T2I-Adapter: Learning Adapters to Dig out More Controllable Ability for Text-to-Image Diffusion Models

2 code implementations • 16 Feb 2023 • Chong Mou, Xintao Wang, Liangbin Xie, Yanze Wu, Jian Zhang, Zhongang Qi, Ying Shan, XiaoHu Qie

In this paper, we aim to ``dig out" the capabilities that T2I models have implicitly learned, and then explicitly use them to control the generation more granularly.

Image Generation Style Transfer

3,147

Paper
Code

OSRT: Omnidirectional Image Super-Resolution with Distortion-aware Transformer

1 code implementation • CVPR 2023 • Fanghua Yu, Xintao Wang, Mingdeng Cao, Gen Li, Ying Shan, Chao Dong

Omnidirectional images (ODIs) have obtained lots of research interest for immersive experiences.

Data Augmentation ERP +1

Paper
Code

Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models

no code implementations • CVPR 2023 • Jiale Xu, Xintao Wang, Weihao Cheng, Yan-Pei Cao, Ying Shan, XiaoHu Qie, Shenghua Gao

Specifically, we first generate a high-quality 3D shape from the input text in the text-to-shape stage as a 3D shape prior.

Image Generation Text to 3D +1

Paper
Add Code

Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation

3 code implementations • ICCV 2023 • Jay Zhangjie Wu, Yixiao Ge, Xintao Wang, Weixian Lei, YuChao Gu, Yufei Shi, Wynne Hsu, Ying Shan, XiaoHu Qie, Mike Zheng Shou

To replicate the success of text-to-image (T2I) generation, recent works employ large-scale video datasets to train a text-to-video (T2V) generator.

Style Transfer Text-to-Video Generation +1

4,079

Paper
Code

Reference-based Image and Video Super-Resolution via C2-Matching

1 code implementation • 19 Dec 2022 • Yuming Jiang, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Ziwei Liu

To tackle these challenges, we propose C2-Matching in this work, which performs explicit robust matching crossing transformation and resolution.

Image Super-Resolution Reference-based Super-Resolution +2

191

Paper
Code

Mitigating Artifacts in Real-World Video Super-Resolution Models

1 code implementation • 14 Dec 2022 • Liangbin Xie, Xintao Wang, Shuwei Shi, Jinjin Gu, Chao Dong, Ying Shan

To aggregate a new hidden state that contains fewer artifacts from the hidden state pool, we devise a Selective Cross Attention (SCA) module, in which the attention between input features and each hidden state is calculated.

Video Super-Resolution

Paper
Code

MAPS-KB: A Million-scale Probabilistic Simile Knowledge Base

2 code implementations • 10 Dec 2022 • Qianyu He, Xintao Wang, Jiaqing Liang, Yanghua Xiao

The ability to understand and generate similes is an imperative step to realize human-level AI.

Paper
Code

Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis

no code implementations • 6 Dec 2022 • YuChao Gu, Xintao Wang, Yixiao Ge, Ying Shan, XiaoHu Qie, Mike Zheng Shou

Vector-Quantized (VQ-based) generative models usually consist of two basic components, i. e., VQ tokenizers and generative transformers.

Conditional Image Generation

Paper
Add Code

GLEAN: Generative Latent Bank for Image Super-Resolution and Beyond

1 code implementation • 29 Jul 2022 • Kelvin C. K. Chan, Xiangyu Xu, Xintao Wang, Jinwei Gu, Chen Change Loy

While most existing perceptual-oriented approaches attempt to generate realistic outputs through learning with adversarial loss, our method, Generative LatEnt bANk (GLEAN), goes beyond existing practices by directly leveraging rich and diverse priors encapsulated in a pre-trained GAN.

Colorization Image Colorization +2

6,564

Paper
Code

FaceFormer: Scale-aware Blind Face Restoration with Transformers

no code implementations • 20 Jul 2022 • Aijin Li, Gen Li, Lei Sun, Xintao Wang

Blind face restoration usually encounters with diverse scale face inputs, especially in the real world.

Blind Face Restoration

Paper
Add Code

Rethinking Alignment in Video Super-Resolution Transformers

1 code implementation • 18 Jul 2022 • Shuwei Shi, Jinjin Gu, Liangbin Xie, Xintao Wang, Yujiu Yang, Chao Dong

In this paper, we rethink the role of alignment in VSR Transformers and make several counter-intuitive observations.

Ranked #2 on Video Super-Resolution on Vid4 - 4x upscaling

Common Sense Reasoning Video Super-Resolution

101

Paper
Code

Language Models as Knowledge Embeddings

1 code implementation • 25 Jun 2022 • Xintao Wang, Qianyu He, Jiaqing Liang, Yanghua Xiao

In this paper, we propose LMKE, which adopts Language Models to derive Knowledge Embeddings, aiming at both enriching representations of long-tail entities and solving problems of prior description-based methods.

Ranked #3 on Link Prediction on WN18RR

Contrastive Learning Link Prediction +1

Paper
Code

AnimeSR: Learning Real-World Super-Resolution Models for Animation Videos

1 code implementation • 14 Jun 2022 • Yanze Wu, Xintao Wang, Gen Li, Ying Shan

This paper studies the problem of real-world video super-resolution (VSR) for animation videos, and reveals three key improvements for practical animation VSR.

Video Super-Resolution

311

Paper
Code

NTIRE 2022 Challenge on High Dynamic Range Imaging: Methods and Results

no code implementations • 25 May 2022 • Eduardo Pérez-Pellitero, Sibi Catley-Chandar, Richard Shaw, Aleš Leonardis, Radu Timofte, Zexin Zhang, Cen Liu, Yunbo Peng, Yue Lin, Gaocheng Yu, Jin Zhang, Zhe Ma, Hongbin Wang, Xiangyu Chen, Xintao Wang, Haiwei Wu, Lin Liu, Chao Dong, Jiantao Zhou, Qingsen Yan, Song Zhang, Weiye Chen, Yuhang Liu, Zhen Zhang, Yanning Zhang, Javen Qinfeng Shi, Dong Gong, Dan Zhu, Mengdi Sun, Guannan Chen, Yang Hu, Haowei Li, Baozhu Zou, Zhen Liu, Wenjie Lin, Ting Jiang, Chengzhi Jiang, Xinpeng Li, Mingyan Han, Haoqiang Fan, Jian Sun, Shuaicheng Liu, Juan Marín-Vega, Michael Sloth, Peter Schneider-Kamp, Richard Röttger, Chunyang Li, Long Bao, Gang He, Ziyao Xu, Li Xu, Gen Zhan, Ming Sun, Xing Wen, Junlin Li, Shuang Feng, Fei Lei, Rui Liu, Junxiang Ruan, Tianhong Dai, Wei Li, Zhan Lu, Hengyan Liu, Peian Huang, Guangyu Ren, Yonglin Luo, Chang Liu, Qiang Tu, Fangya Li, Ruipeng Gang, Chenghua Li, Jinjing Li, Sai Ma, Chenming Liu, Yizhen Cao, Steven Tel, Barthelemy Heyrman, Dominique Ginhac, Chul Lee, Gahyeon Kim, Seonghyun Park, An Gia Vien, Truong Thanh Nhat Mai, Howoon Yoon, Tu Vo, Alexander Holston, Sheir Zaheer, Chan Y. Park

The challenge is composed of two tracks with an emphasis on fidelity and complexity constraints: In Track 1, participants are asked to optimize objective fidelity scores while imposing a low-complexity constraint (i. e. solutions can not exceed a given number of operations).

Image Restoration Vocal Bursts Intensity Prediction

Paper
Add Code

VQFR: Blind Face Restoration with Vector-Quantized Dictionary and Parallel Decoder

1 code implementation • 13 May 2022 • YuChao Gu, Xintao Wang, Liangbin Xie, Chao Dong, Gen Li, Ying Shan, Ming-Ming Cheng

Equipped with the VQ codebook as a facial detail dictionary and the parallel decoder design, the proposed VQFR can largely enhance the restored quality of facial details while keeping the fidelity to previous methods.

Blind Face Restoration Quantization

301

Paper
Code

RepSR: Training Efficient VGG-style Super-Resolution Networks with Structural Re-Parameterization and Batch Normalization

no code implementations • 11 May 2022 • Xintao Wang, Chao Dong, Ying Shan

Extensive experiments demonstrate that our simple RepSR is capable of achieving superior performance to previous SR re-parameterization methods among different model sizes.

Super-Resolution

Paper
Add Code

Accelerating the Training of Video Super-Resolution Models

no code implementations • 10 May 2022 • Lijian Lin, Xintao Wang, Zhongang Qi, Ying Shan

In this work, we show that it is possible to gradually train video models from small to large spatial/temporal sizes, i. e., in an easy-to-hard manner.

Video Super-Resolution

Paper
Add Code

MM-RealSR: Metric Learning based Interactive Modulation for Real-World Super-Resolution

1 code implementation • 10 May 2022 • Chong Mou, Yanze Wu, Xintao Wang, Chao Dong, Jian Zhang, Ying Shan

Instead of using known degradation levels as explicit supervision to the interactive mechanism, we propose a metric learning strategy to map the unquantifiable degradation levels in real-world scenarios to a metric space, which is trained in an unsupervised manner.

Image Restoration Metric Learning +1

143

Paper
Code

Activating More Pixels in Image Super-Resolution Transformer

2 code implementations • CVPR 2023 • Xiangyu Chen, Xintao Wang, Jiantao Zhou, Yu Qiao, Chao Dong

In the training stage, we additionally adopt a same-task pre-training strategy to exploit the potential of the model for further improvement.

Ranked #1 on Image Super-Resolution on Set5 - 2x upscaling

Image Super-Resolution

1,072

Paper
Code

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

2 code implementations • 20 Apr 2022 • Ren Yang, Radu Timofte, Meisong Zheng, Qunliang Xing, Minglang Qiao, Mai Xu, Lai Jiang, Huaida Liu, Ying Chen, Youcheng Ben, Xiao Zhou, Chen Fu, Pei Cheng, Gang Yu, Junyi Li, Renlong Wu, Zhilu Zhang, Wei Shang, Zhengyao Lv, Yunjin Chen, Mingcai Zhou, Dongwei Ren, Kai Zhang, WangMeng Zuo, Pavel Ostyakov, Vyal Dmitry, Shakarim Soltanayev, Chervontsev Sergey, Zhussip Magauiya, Xueyi Zou, Youliang Yan, Pablo Navarrete Michelini, Yunhua Lu, Diankai Zhang, Shaoli Liu, Si Gao, Biao Wu, Chengjian Zheng, Xiaofeng Zhang, Kaidi Lu, Ning Wang, Thuong Nguyen Canh, Thong Bach, Qing Wang, Xiaopeng Sun, Haoyu Ma, Shijie Zhao, Junlin Li, Liangbin Xie, Shuwei Shi, Yujiu Yang, Xintao Wang, Jinjin Gu, Chao Dong, Xiaodi Shi, Chunmei Nian, Dong Jiang, Jucai Lin, Zhihuai Xie, Mao Ye, Dengyan Luo, Liuhan Peng, Shengjie Chen, Qian Wang, Xin Liu, Boyang Liang, Hang Dong, Yuhao Huang, Kai Chen, Xingbei Guo, Yujing Sun, Huilei Wu, Pengxu Wei, Yulin Huang, Junying Chen, Ik Hyun Lee, Sunder Ali Khowaja, Jiseok Yoon

This challenge includes three tracks.

Super-Resolution

Paper
Code

Temporally Consistent Video Colorization with Deep Feature Propagation and Self-regularization Learning

1 code implementation • 9 Oct 2021 • Yihao Liu, Hengyuan Zhao, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Yu Qiao, Chao Dong

We address this problem from a new perspective, by jointly considering colorization and temporal consistency in a unified framework.

Colorization Image Colorization

Paper
Code

Towards Vivid and Diverse Image Colorization with Generative Color Prior

1 code implementation • ICCV 2021 • Yanze Wu, Xintao Wang, Yu Li, Honglun Zhang, Xun Zhao, Ying Shan

Codes are available at https://github. com/ToTheBeginning/GCP-Colorization.

Colorization Image Colorization

Paper
Code

Finding Discriminative Filters for Specific Degradations in Blind Super-Resolution

1 code implementation • NeurIPS 2021 • Liangbin Xie, Xintao Wang, Chao Dong, Zhongang Qi, Ying Shan

Unlike previous integral gradient methods, our FAIG aims at finding the most discriminative filters instead of input pixels/features for degradation removal in blind SR networks.

Blind Super-Resolution Super-Resolution

117

Paper
Code

Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data

8 code implementations • 22 Jul 2021 • Xintao Wang, Liangbin Xie, Chao Dong, Ying Shan

Though many attempts have been made in blind super-resolution to restore low-resolution images with unknown and complex degradations, they are still far from addressing general real-world degraded images.

Ranked #3 on Video Super-Resolution on MSU Video Upscalers: Quality Enhancement

Blind Super-Resolution Video Super-Resolution

26,006

Paper
Code

Robust Reference-based Super-Resolution via C2-Matching

1 code implementation • CVPR 2021 • Yuming Jiang, Kelvin C. K. Chan, Xintao Wang, Chen Change Loy, Ziwei Liu

However, performing local transfer is difficult because of two gaps between input and reference images: the transformation gap (e. g. scale and rotation) and the resolution gap (e. g. HR and LR).

Reference-based Super-Resolution

191

Paper
Code

Towards Real-World Blind Face Restoration with Generative Facial Prior

1 code implementation • CVPR 2021 • Xintao Wang, Yu Li, Honglun Zhang, Ying Shan

Blind face restoration usually relies on facial priors, such as facial geometry prior or reference prior, to restore realistic and faithful details.

Ranked #2 on Blind Face Restoration on CelebA-Test

Blind Face Restoration Video Super-Resolution

34,558

Paper
Code

Positional Encoding as Spatial Inductive Bias in GANs

no code implementations • CVPR 2021 • Rui Xu, Xintao Wang, Kai Chen, Bolei Zhou, Chen Change Loy

In this work, taking SinGAN and StyleGAN2 as examples, we show that such capability, to a large extent, is brought by the implicit positional encoding when using zero padding in the generators.

Image Manipulation Inductive Bias +1

Paper
Add Code

BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond

6 code implementations • CVPR 2021 • Kelvin C. K. Chan, Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy

Video super-resolution (VSR) approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension.

Ranked #2 on Video Super-Resolution on MSU Video Super Resolution Benchmark: Detail Restoration

Video Super-Resolution

6,564

Paper
Code

GLEAN: Generative Latent Bank for Large-Factor Image Super-Resolution

no code implementations • CVPR 2021 • Kelvin C. K. Chan, Xintao Wang, Xiangyu Xu, Jinwei Gu, Chen Change Loy

We show that pre-trained Generative Adversarial Networks (GANs), e. g., StyleGAN, can be used as a latent bank to improve the restoration quality of large-factor image super-resolution (SR).

Image Super-Resolution

Paper
Add Code

Understanding Deformable Alignment in Video Super-Resolution

no code implementations • 15 Sep 2020 • Kelvin C. K. Chan, Xintao Wang, Ke Yu, Chao Dong, Chen Change Loy

Aside from the contributions to deformable alignment, our formulation inspires a more flexible approach to introduce offset diversity to flow-based alignment, improving its performance.

Optical Flow Estimation Video Super-Resolution

Paper
Add Code

EDVR: Video Restoration with Enhanced Deformable Convolutional Networks

11 code implementations • 7 May 2019 • Xintao Wang, Kelvin C. K. Chan, Ke Yu, Chao Dong, Chen Change Loy

In this work, we propose a novel Video Restoration framework with Enhanced Deformable networks, termed EDVR, to address these challenges.

Ranked #2 on Deblurring on REDS

Deblurring Video Enhancement +2

6,564

Paper
Code

Path-Restore: Learning Network Path Selection for Image Restoration

1 code implementation • 23 Apr 2019 • Ke Yu, Xintao Wang, Chao Dong, Xiaoou Tang, Chen Change Loy

To leverage this, we propose Path-Restore, a multi-path CNN with a pathfinder that can dynamically select an appropriate route for each image region.

Denoising Image Restoration +1

Paper
Code

Deep Network Interpolation for Continuous Imagery Effect Transition

2 code implementations • CVPR 2019 • Xintao Wang, Ke Yu, Chao Dong, Xiaoou Tang, Chen Change Loy

Deep convolutional neural network has demonstrated its capability of learning a deterministic mapping for the desired imagery effect.

Image Restoration Image-to-Image Translation +2

Paper
Code

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks

45 code implementations • 1 Sep 2018 • Xintao Wang, Ke Yu, Shixiang Wu, Jinjin Gu, Yihao Liu, Chao Dong, Chen Change Loy, Yu Qiao, Xiaoou Tang

To further enhance the visual quality, we thoroughly study three key components of SRGAN - network architecture, adversarial loss and perceptual loss, and improve each of them to derive an Enhanced SRGAN (ESRGAN).

Ranked #2 on Face Hallucination on FFHQ 512 x 512 - 16x upscaling