no code implementations • 10 Dec 2024 • Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong
Story visualization, the task of creating visual narratives from textual descriptions, has seen progress with text-to-image generation models.
1 code implementation • 24 Jul 2024 • Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin
Notably, we introduce a rule-based camera trajectory generation method, enabling the synthetic pipeline to incorporate diverse and precise camera motion annotation, which can rarely be found in real-world data.
no code implementations • 11 Jul 2024 • Zhening Xing, Gereon Fox, Yanhong Zeng, Xingang Pan, Mohamed Elgharib, Christian Theobalt, Kai Chen
State-of-the-art video diffusion models leverage bi-directional temporal attention to model the correlations between the current frame and all the surrounding (i. e. including future) frames, which hinders them from processing streaming videos.
1 code implementation • 1 Jul 2024 • Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen
Meanwhile, the temporal controller incorporates an onset detector and a timestampbased adapter to achieve precise audio-video alignment.
2 code implementations • 1 Jul 2024 • Junyao Gao, Yanchen Liu, Yanan sun, Yinhao Tang, Yanhong Zeng, Kai Chen, Cairong Zhao
In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning.
Ranked #1 on Style Transfer on StyleBench
no code implementations • 28 Jun 2024 • Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen
Diffusion models can generate realistic and diverse images, potentially facilitating data availability for data-intensive perception tasks.
no code implementations • 25 Jun 2024 • Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen
In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements.
1 code implementation • 13 Jun 2024 • Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen
Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas.
no code implementations • CVPR 2024 • Junshu Tang, Yanhong Zeng, Ke Fan, Xuheng Wang, Bo Dai, Kai Chen, Lizhuang Ma
Creating and animating 3D biped cartoon characters is crucial and valuable in various applications.
1 code implementation • CVPR 2024 • Yiming Zhang, Zhening Xing, Yanhong Zeng, Youqing Fang, Kai Chen
Recent advancements in personalized text-to-image (T2I) models have revolutionized content creation, empowering non-experts to generate stunning images with unique styles.
1 code implementation • 6 Dec 2023 • Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, Kai Chen
Second, we demonstrate the versatility of the task prompt in PowerPaint by showcasing its effectiveness as a negative prompt for object removal.
no code implementations • 3 Jul 2022 • Fuzhi Yang, Huan Yang, Yanhong Zeng, Jianlong Fu, Hongtao Lu
The extractor estimates the degradations in LR inputs and guides the meta-restoration modules to predict restoration parameters for different degradations on-the-fly.
1 code implementation • CVPR 2022 • Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo
To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts.
Ranked #17 on Video Retrieval on MSR-VTT
no code implementations • NeurIPS 2021 • Yanhong Zeng, Huan Yang, Hongyang Chao, Jianbo Wang, Jianlong Fu
Given a sequence of style tokens, the TokenGAN is able to control the image synthesis by assigning the styles to the content tokens by attention mechanism with a Transformer.
1 code implementation • 5 Apr 2021 • Yanhong Zeng, Jianlong Fu, Hongyang Chao
First, we calculate full-body anthropometric parameters from limited user inputs by imputation technique, and thus essential anthropometric parameters for 3D body reshaping can be obtained.
2 code implementations • 3 Apr 2021 • Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo
For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task.
Ranked #11 on Image Inpainting on Places2
1 code implementation • NeurIPS 2020 • Heliang Zheng, Jianlong Fu, Yanhong Zeng, Jiebo Luo, Zheng-Jun Zha
Such a model disentangles latent factors according to the semantic of feature channels by channel-/group- wise fusion of latent codes and feature channels.
2 code implementations • ECCV 2020 • Yanhong Zeng, Jianlong Fu, Hongyang Chao
In this paper, we propose to learn a joint Spatial-Temporal Transformer Network (STTN) for video inpainting.
Ranked #5 on Seeing Beyond the Visible on KITTI360-EX
2 code implementations • CVPR 2019 • Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo
As the missing content can be filled by attention transfer from deep to shallow in a pyramid fashion, both visual and semantic coherence for image inpainting can be ensured.