Search Results for author: Yanhong Zeng

Found 19 papers, 12 papers with code

DiffSensei: Bridging Multi-Modal LLMs and Diffusion Models for Customized Manga Generation

no code implementations10 Dec 2024 Jianzong Wu, Chao Tang, Jingbo Wang, Yanhong Zeng, Xiangtai Li, Yunhai Tong

Story visualization, the task of creating visual narratives from textual descriptions, has seen progress with text-to-image generation models.

Language Modelling Large Language Model +3

HumanVid: Demystifying Training Data for Camera-controllable Human Image Animation

1 code implementation24 Jul 2024 Zhenzhi Wang, Yixuan Li, Yanhong Zeng, Youqing Fang, Yuwei Guo, Wenran Liu, Jing Tan, Kai Chen, Tianfan Xue, Bo Dai, Dahua Lin

Notably, we introduce a rule-based camera trajectory generation method, enabling the synthetic pipeline to incorporate diverse and precise camera motion annotation, which can rarely be found in real-world data.

Benchmarking Human Animation +2

Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models

no code implementations11 Jul 2024 Zhening Xing, Gereon Fox, Yanhong Zeng, Xingang Pan, Mohamed Elgharib, Christian Theobalt, Kai Chen

State-of-the-art video diffusion models leverage bi-directional temporal attention to model the correlations between the current frame and all the surrounding (i. e. including future) frames, which hinders them from processing streaming videos.

Denoising Translation

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

1 code implementation1 Jul 2024 Yiming Zhang, Yicheng Gu, Yanhong Zeng, Zhening Xing, Yuancheng Wang, Zhizheng Wu, Kai Chen

Meanwhile, the temporal controller incorporates an onset detector and a timestampbased adapter to achieve precise audio-video alignment.

Audio Generation Video Alignment +1

StyleShot: A Snapshot on Any Style

2 code implementations1 Jul 2024 Junyao Gao, Yanchen Liu, Yanan sun, Yinhao Tang, Yanhong Zeng, Kai Chen, Cairong Zhao

In this paper, we show that, a good style representation is crucial and sufficient for generalized style transfer without test-time tuning.

Image Generation Style Transfer

Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language

no code implementations28 Jun 2024 Yicheng Chen, Xiangtai Li, Yining Li, Yanhong Zeng, Jianzong Wu, Xiangyu Zhao, Kai Chen

Diffusion models can generate realistic and diverse images, potentially facilitating data availability for data-intensive perception tasks.

Image Captioning

MotionBooth: Motion-Aware Customized Text-to-Video Generation

no code implementations25 Jun 2024 Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen

In this work, we present MotionBooth, an innovative framework designed for animating customized subjects with precise control over both object and camera movements.

Text-to-Video Generation Video Generation

Sagiri: Low Dynamic Range Image Enhancement with Generative Diffusion Prior

1 code implementation13 Jun 2024 Baiang Li, Sizhuo Ma, Yanhong Zeng, Xiaogang Xu, Youqing Fang, Zhao Zhang, Jian Wang, Kai Chen

Capturing High Dynamic Range (HDR) scenery using 8-bit cameras often suffers from over-/underexposure, loss of fine details due to low bit-depth compression, skewed color distributions, and strong noise in dark areas.

Image Enhancement

PIA: Your Personalized Image Animator via Plug-and-Play Modules in Text-to-Image Models

1 code implementation CVPR 2024 Yiming Zhang, Zhening Xing, Yanhong Zeng, Youqing Fang, Kai Chen

Recent advancements in personalized text-to-image (T2I) models have revolutionized content creation, empowering non-experts to generate stunning images with unique styles.

Image Animation

A Task is Worth One Word: Learning with Task Prompts for High-Quality Versatile Image Inpainting

1 code implementation6 Dec 2023 Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, Kai Chen

Second, we demonstrate the versatility of the task prompt in PowerPaint by showcasing its effectiveness as a negative prompt for object removal.

Image Inpainting Object

Degradation-Guided Meta-Restoration Network for Blind Super-Resolution

no code implementations3 Jul 2022 Fuzhi Yang, Huan Yang, Yanhong Zeng, Jianlong Fu, Hongtao Lu

The extractor estimates the degradations in LR inputs and guides the meta-restoration modules to predict restoration parameters for different degradations on-the-fly.

Blind Super-Resolution Super-Resolution

Advancing High-Resolution Video-Language Representation with Large-Scale Video Transcriptions

1 code implementation CVPR 2022 Hongwei Xue, Tiankai Hang, Yanhong Zeng, Yuchong Sun, Bei Liu, Huan Yang, Jianlong Fu, Baining Guo

To enable VL pre-training, we jointly optimize the HD-VILA model by a hybrid Transformer that learns rich spatiotemporal features, and a multimodal Transformer that enforces interactions of the learned video features with diversified texts.

Retrieval Super-Resolution +4

Improving Visual Quality of Image Synthesis by A Token-based Generator with Transformers

no code implementations NeurIPS 2021 Yanhong Zeng, Huan Yang, Hongyang Chao, Jianbo Wang, Jianlong Fu

Given a sequence of style tokens, the TokenGAN is able to control the image synthesis by assigning the styles to the content tokens by attention mechanism with a Transformer.

Image Generation

3D Human Body Reshaping with Anthropometric Modeling

1 code implementation5 Apr 2021 Yanhong Zeng, Jianlong Fu, Hongyang Chao

First, we calculate full-body anthropometric parameters from limited user inputs by imputation technique, and thus essential anthropometric parameters for 3D body reshaping can be obtained.

feature selection Imputation +1

Aggregated Contextual Transformations for High-Resolution Image Inpainting

2 code implementations3 Apr 2021 Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo

For improving texture synthesis, we enhance the discriminator of AOT-GAN by training it with a tailored mask-prediction task.

Image Inpainting Texture Synthesis +1

Learning Semantic-aware Normalization for Generative Adversarial Networks

1 code implementation NeurIPS 2020 Heliang Zheng, Jianlong Fu, Yanhong Zeng, Jiebo Luo, Zheng-Jun Zha

Such a model disentangles latent factors according to the semantic of feature channels by channel-/group- wise fusion of latent codes and feature channels.

Image Inpainting Unconditional Image Generation

Learning Pyramid-Context Encoder Network for High-Quality Image Inpainting

2 code implementations CVPR 2019 Yanhong Zeng, Jianlong Fu, Hongyang Chao, Baining Guo

As the missing content can be filled by attention transfer from deep to shallow in a pyramid fashion, both visual and semantic coherence for image inpainting can be ensured.

Decoder Image Inpainting +1

Cannot find the paper you are looking for? You can Submit a new open access paper.