no code implementations • 13 Mar 2025 • Hao He, Ceyuan Yang, Shanchuan Lin, Yinghao Xu, Meng Wei, Liangke Gui, Qi Zhao, Gordon Wetzstein, Lu Jiang, Hongsheng Li
This paper introduces CameraCtrl II, a framework that enables large-scale dynamic scene exploration through a camera-controlled video diffusion model.
no code implementations • 13 Mar 2025 • Yuwei Guo, Ceyuan Yang, Ziyan Yang, Zhibei Ma, Zhijie Lin, Zhenheng Yang, Dahua Lin, Lu Jiang
Recent advances in video generation can produce realistic, minute-long single-shot videos with scalable diffusion transformers.
no code implementations • 14 Jan 2025 • Shanchuan Lin, Xin Xia, Yuxi Ren, Ceyuan Yang, Xuefeng Xiao, Lu Jiang
The diffusion models are widely used for image and video generation, but their iterative generation process is slow and expansive.
no code implementations • 2 Jan 2025 • Jianyi Wang, Zhijie Lin, Meng Wei, Yang Zhao, Ceyuan Yang, Chen Change Loy, Lu Jiang
Video restoration poses non-trivial challenges in maintaining fidelity while recovering temporally consistent details from unknown degradations in the wild.
1 code implementation • 30 Dec 2024 • Qingyan Bai, Hao Ouyang, Yinghao Xu, Qiuyu Wang, Ceyuan Yang, Ka Leong Cheng, Yujun Shen, Qifeng Chen
As a verified need, consistent editing across in-the-wild images remains a technical challenge arising from various unmanageable factors, like object poses, lighting conditions, and photography environments.
no code implementations • 10 Oct 2024 • Zhengyang Liang, Hao He, Ceyuan Yang, Bo Dai
Diffusion transformers (DiT) have already achieved appealing synthesis and scaling properties in content recreation, e. g., image and video generation.
no code implementations • 28 May 2024 • Qihang Zhang, Yinghao Xu, Chaoyang Wang, Hsin-Ying Lee, Gordon Wetzstein, Bolei Zhou, Ceyuan Yang
This results in a lack of a unified approach to effectively control and manipulate scenes at the 3D level with different levels of granularity.
1 code implementation • 2 Apr 2024 • Hao He, Yinghao Xu, Yuwei Guo, Gordon Wetzstein, Bo Dai, Hongsheng Li, Ceyuan Yang
Controllability plays a crucial role in video generation since it allows users to create desired content.
1 code implementation • 21 Mar 2024 • Yinghao Xu, Zifan Shi, Wang Yifan, Hansheng Chen, Ceyuan Yang, Sida Peng, Yujun Shen, Gordon Wetzstein
We introduce GRM, a large-scale reconstructor capable of recovering a 3D asset from sparse-view images in around 0. 1s.
1 code implementation • 21 Feb 2024 • Qingyan Bai, Zifan Shi, Yinghao Xu, Hao Ouyang, Qiuyu Wang, Ceyuan Yang, Xuan Wang, Gordon Wetzstein, Yujun Shen, Qifeng Chen
Second, thanks to the powerful priors, our module could focus on the learning of editing-related variations, such that it manages to handle various types of editing simultaneously in the training phase and further supports fast adaptation to user-specified customized types of editing during inference (e. g., with ~5min fine-tuning per style).
no code implementations • CVPR 2024 • Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee
We marry the locality of objects with globality of scenes by introducing a hybrid 3D representation - explicit for objects and implicit for scenes.
no code implementations • 13 Dec 2023 • Qihang Zhang, Chaoyang Wang, Aliaksandr Siarohin, Peiye Zhuang, Yinghao Xu, Ceyuan Yang, Dahua Lin, Bolei Zhou, Sergey Tulyakov, Hsin-Ying Lee
We are witnessing significant breakthroughs in the technology for generating 3D objects from text.
no code implementations • 7 Dec 2023 • Wen Wang, Kecheng Zheng, Qiuyu Wang, Hao Chen, Zifan Shi, Ceyuan Yang, Yujun Shen, Chunhua Shen
We offer a new perspective on approaching the task of video generation.
no code implementations • CVPR 2024 • Qihang Zhang, Yinghao Xu, Yujun Shen, Bo Dai, Bolei Zhou, Ceyuan Yang
Generating large-scale 3D scenes cannot simply apply existing 3D object synthesis technique since 3D scenes usually hold complex spatial configurations and consist of a number of objects at varying scales.
no code implementations • 30 Nov 2023 • Mengfei Xia, Yujun Shen, Ceyuan Yang, Ran Yi, Wenping Wang, Yong-Jin Liu
In this work, we revisit the mathematical foundations of GANs, and theoretically reveal that the native adversarial loss for GAN training is insufficient to fix the problem of subsets with positive Lebesgue measure of the generated data manifold lying out of the real data manifold.
1 code implementation • 28 Nov 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Maneesh Agrawala, Dahua Lin, Bo Dai
The development of text-to-video (T2V), i. e., generating videos with a given text prompt, has been significantly advanced in recent years.
2 code implementations • 26 Sep 2023 • Yaohui Wang, Xinyuan Chen, Xin Ma, Shangchen Zhou, Ziqi Huang, Yi Wang, Ceyuan Yang, Yinan He, Jiashuo Yu, Peiqing Yang, Yuwei Guo, Tianxing Wu, Chenyang Si, Yuming Jiang, Cunjian Chen, Chen Change Loy, Bo Dai, Dahua Lin, Yu Qiao, Ziwei Liu
To this end, we propose LaVie, an integrated video generation framework that operates on cascaded video latent diffusion models, comprising a base T2V model, a temporal interpolation model, and a video super-resolution model.
Ranked #4 on
Text-to-Video Generation
on EvalCrafter Text-to-Video (ECTV) Dataset
(using extra training data)
1 code implementation • 7 Sep 2023 • Jiapeng Zhu, Ceyuan Yang, Kecheng Zheng, Yinghao Xu, Zifan Shi, Yujun Shen
Due to the difficulty in scaling up, generative adversarial networks (GANs) seem to be falling from grace on the task of text-conditioned image synthesis.
8 code implementations • 10 Jul 2023 • Yuwei Guo, Ceyuan Yang, Anyi Rao, Zhengyang Liang, Yaohui Wang, Yu Qiao, Maneesh Agrawala, Dahua Lin, Bo Dai
Once trained, the motion module can be inserted into a personalized T2I model to form a personalized animation generator.
no code implementations • 20 Jan 2023 • Jianyuan Wang, Lalit Bhagat, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou
In this work, we propose a self-supervised approach to improve the spatial steerability of GANs without searching for steerable directions in the latent space or requiring extra annotations.
no code implementations • 12 Jan 2023 • Yinghao Xu, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou
In this work we investigate that such a generative feature learned from image synthesis exhibits great potentials in solving a wide range of computer vision tasks, including both generative ones and more importantly discriminative ones.
no code implementations • ICCV 2023 • Jiapeng Zhu, Ceyuan Yang, Yujun Shen, Zifan Shi, Bo Dai, Deli Zhao, Qifeng Chen
This work presents an easy-to-use regularizer for GAN training, which helps explicitly link some axes of the latent space to a set of pixels in the synthesized image.
no code implementations • CVPR 2023 • Yinghao Xu, Menglei Chai, Zifan Shi, Sida Peng, Ivan Skorokhodov, Aliaksandr Siarohin, Ceyuan Yang, Yujun Shen, Hsin-Ying Lee, Bolei Zhou, Sergey Tulyakov
Existing 3D-aware image synthesis approaches mainly focus on generating a single canonical object and show limited capacity in composing a complex scene containing a variety of objects.
1 code implementation • 14 Dec 2022 • Qihang Zhang, Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou
Video generation requires synthesizing consistent and persistent frames with dynamic content over time.
Ranked #1 on
Video Generation
on YouTube Driving
1 code implementation • CVPR 2023 • Qingyan Bai, Ceyuan Yang, Yinghao Xu, Xihui Liu, Yujiu Yang, Yujun Shen
Generative adversarial network (GAN) is formulated as a two-player game between a generator (G) and a discriminator (D), where D is asked to differentiate whether an image comes from real data or is produced by G. Under such a formulation, D plays as the rule maker and hence tends to dominate the competition.
no code implementations • 20 Sep 2022 • Ceyuan Yang, Yujun Shen, Yinghao Xu, Deli Zhao, Bo Dai, Bolei Zhou
Two capacity adjusting schemes are developed for training GANs under different data regimes: i) given a sufficient amount of training data, the discriminator benefits from a progressively increased learning capacity, and ii) when the training data is limited, gradually decreasing the layer width mitigates the over-fitting issue of the discriminator.
1 code implementation • 14 Jul 2022 • Zhengkai Jiang, Yuxi Li, Ceyuan Yang, Peng Gao, Yabiao Wang, Ying Tai, Chengjie Wang
Unsupervised Domain Adaptation (UDA) aims to adapt the model trained on the labeled source domain to an unlabeled target domain.
Ranked #15 on
Unsupervised Domain Adaptation
on SYNTHIA-to-Cityscapes
1 code implementation • 25 May 2022 • Zhaoyang Lyu, Xudong Xu, Ceyuan Yang, Dahua Lin, Bo Dai
By modeling the reverse process of gradually diffusing the data distribution into a Gaussian distribution, generating a sample in DDPMs can be regarded as iteratively denoising a randomly sampled Gaussian noise.
1 code implementation • CVPR 2022 • Yinghao Xu, Sida Peng, Ceyuan Yang, Yujun Shen, Bolei Zhou
The feature field is further accumulated into a 2D feature map as the textural representation, followed by a neural renderer for appearance synthesis.
no code implementations • CVPR 2022 • Yinghao Xu, Fangyun Wei, Xiao Sun, Ceyuan Yang, Yujun Shen, Bo Dai, Bolei Zhou, Stephen Lin
Typically in recent work, the pseudo-labels are obtained by training a model on the labeled data, and then using confident predictions from the model to teach itself.
1 code implementation • CVPR 2022 • Jianyuan Wang, Ceyuan Yang, Yinghao Xu, Yujun Shen, Hongdong Li, Bolei Zhou
We further propose to align the spatial awareness of G with the attention map induced from D. Through this way we effectively lessen the information gap between D and G. Extensive results show that our method pushes the two-player game in GANs closer to the equilibrium, leading to a better synthesis performance.
no code implementations • ICCV 2023 • Ceyuan Yang, Yujun Shen, Zhiyi Zhang, Yinghao Xu, Jiapeng Zhu, Zhirong Wu, Bolei Zhou
We then equip the well-learned discriminator backbone with an attribute classifier to ensure that the generator captures the appropriate characters from the reference.
no code implementations • 29 Sep 2021 • Haoyue Bai, Ceyuan Yang, Yinghao Xu, S.-H. Gary Chan, Bolei Zhou
In this paper, we employ interpolated generative models to generate OoD samples at training time via data augmentation.
1 code implementation • NeurIPS 2021 • Ceyuan Yang, Yujun Shen, Yinghao Xu, Bolei Zhou
Meanwhile, the learned instance discrimination capability from the discriminator is in turn exploited to encourage the generator for diverse generation.
Ranked #11 on
Image Generation
on FFHQ 256 x 256
(FD metric)
1 code implementation • CVPR 2021 • Ceyuan Yang, Zhirong Wu, Bolei Zhou, Stephen Lin
The pretext task is to predict the instance category given the composited images as well as the foreground bounding boxes.
1 code implementation • CVPR 2021 • Yinghao Xu, Yujun Shen, Jiapeng Zhu, Ceyuan Yang, Bolei Zhou
Generative Adversarial Networks (GANs) have recently advanced image synthesis by learning the underlying distribution of the observed data.
1 code implementation • 29 Jun 2020 • Yinghao Xu, Ceyuan Yang, Ziwei Liu, Bo Dai, Bolei Zhou
Recent attempts for unsupervised landmark learning leverage synthesized image pairs that are similar in appearance but different in poses.
1 code implementation • 28 Jun 2020 • Ceyuan Yang, Yinghao Xu, Bo Dai, Bolei Zhou
Visual tempo, which describes how fast an action goes, has shown its potential in supervised action recognition.
2 code implementations • 18 May 2020 • Yujun Shen, Ceyuan Yang, Xiaoou Tang, Bolei Zhou
In this work, we propose a framework called InterFaceGAN to interpret the disentangled face representation learned by the state-of-the-art GAN models and study the properties of the facial semantics encoded in the latent space.
3 code implementations • CVPR 2020 • Ceyuan Yang, Yinghao Xu, Jianping Shi, Bo Dai, Bolei Zhou
Previous works often capture the visual tempo through sampling raw videos at multiple rates and constructing an input-level frame pyramid, which usually requires a costly multi-branch network to handle.
Ranked #105 on
Action Recognition
on Something-Something V2
2 code implementations • 21 Nov 2019 • Ceyuan Yang, Yujun Shen, Bolei Zhou
Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what generative models have learned inside the deep generative representations and how photo-realistic images are able to be composed of the layer-wise stochasticity introduced in recent GANs.
1 code implementation • ECCV 2020 • Zhengkai Jiang, Yu Liu, Ceyuan Yang, Jihao Liu, Peng Gao, Qian Zhang, Shiming Xiang, Chunhong Pan
Transferring existing image-based detectors to the video is non-trivial since the quality of frames is always deteriorated by part occlusion, rare pose, and motion blur.
Ranked #24 on
Video Object Detection
on ImageNet VID
no code implementations • 25 Sep 2019 • Ceyuan Yang, Yujun Shen, Bolei Zhou
Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what networks have learned inside the deep generative representations and how photo-realistic images are able to be composed from random noises.
no code implementations • ECCV 2018 • Xinge Zhu, Hui Zhou, Ceyuan Yang, Jianping Shi, Dahua Lin
Due to the expensive and time-consuming annotations (e. g., segmentation) for real-world images, recent works in computer vision resort to synthetic data.
no code implementations • ECCV 2018 • Ceyuan Yang, Zhe Wang, Xinge Zhu, Chen Huang, Jianping Shi, Dahua Lin
Human pose, on the other hand, can represent motion patterns intrinsically and interpretably, and impose the geometric constraints regardless of appearance.