Search Results for author: Yanhui Wang

Found 5 papers, 0 papers with code

How to Understand Named Entities: Using Common Sense for News Captioning

no code implementations • 11 Mar 2024 • Ning Xu, Yanhui Wang, Tingting Zhang, Hongshuo Tian, Mohan Kankanhalli, An-An Liu

Our approach consists of three modules: (a) Filter Module aims to clarify the common sense concerning a named entity from two aspects: what does it mean?

Common Sense Reasoning

Paper
Add Code

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

no code implementations • 30 Nov 2023 • Wenming Weng, Ruoyu Feng, Yanhui Wang, Qi Dai, Chunyu Wang, Dacheng Yin, Zhiyuan Zhao, Kai Qiu, Jianmin Bao, Yuhui Yuan, Chong Luo, Yueyi Zhang, Zhiwei Xiong

Second, it preserves the high-fidelity generation ability of the pre-trained image diffusion models by making only minimal network modifications.

Text-to-Video Generation Video Generation

Paper
Add Code

MicroCinema: A Divide-and-Conquer Approach for Text-to-Video Generation

no code implementations • 30 Nov 2023 • Yanhui Wang, Jianmin Bao, Wenming Weng, Ruoyu Feng, Dacheng Yin, Tao Yang, Jingxu Zhang, Qi Dai Zhiyuan Zhao, Chunyu Wang, Kai Qiu, Yuhui Yuan, Chuanxin Tang, Xiaoyan Sun, Chong Luo, Baining Guo

We present MicroCinema, a straightforward yet effective framework for high-quality and coherent text-to-video generation.

Text-to-Image Generation Text-to-Video Generation +1

Paper
Add Code

Panacea: Panoramic and Controllable Video Generation for Autonomous Driving

no code implementations • 28 Nov 2023 • Yuqing Wen, Yucheng Zhao, Yingfei Liu, Fan Jia, Yanhui Wang, Chong Luo, Chi Zhang, Tiancai Wang, Xiaoyan Sun, Xiangyu Zhang

This work notably propels the field of autonomous driving by effectively augmenting the training dataset used for advanced BEV perception techniques.

Autonomous Driving Video Generation

Paper
Add Code

CCEdit: Creative and Controllable Video Editing via Diffusion Models

no code implementations • 28 Sep 2023 • Ruoyu Feng, Wenming Weng, Yanhui Wang, Yuhui Yuan, Jianmin Bao, Chong Luo, Zhibo Chen, Baining Guo

The versatility of our framework is demonstrated through a diverse range of choices in both structure representations and personalized T2I models, as well as the option to provide the edited key frame.

Text-to-Image Generation Video Editing

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.