Search Results for author: Lele Cheng

Found 6 papers, 1 papers with code

Decouple Content and Motion for Conditional Image-to-Video Generation

no code implementations24 Nov 2023 Cuifeng Shen, Yulu Gan, Chen Chen, Xiongwei Zhu, Lele Cheng, Tingting Gao, Jinzhi Wang

The goal of conditional image-to-video (cI2V) generation is to create a believable new video by beginning with the condition, i. e., one image and text. The previous cI2V generation methods conventionally perform in RGB pixel space, with limitations in modeling motion consistency and visual continuity.

Image to Video Generation

Paragraph-to-Image Generation with Information-Enriched Diffusion Model

1 code implementation24 Nov 2023 Weijia Wu, Zhuang Li, Yefei He, Mike Zheng Shou, Chunhua Shen, Lele Cheng, Yan Li, Tingting Gao, Di Zhang, Zhongyuan Wang

In this paper, we introduce an information-enriched diffusion model for paragraph-to-image generation task, termed ParaDiffusion, which delves into the transference of the extensive semantic comprehension capabilities of large language models to the task of image generation.

Image Generation Language Modelling +1

TokenFlow: Rethinking Fine-grained Cross-modal Alignment in Vision-Language Retrieval

no code implementations28 Sep 2022 Xiaohan Zou, Changqiao Wu, Lele Cheng, Zhongyuan Wang

Most existing methods in vision-language retrieval match two modalities by either comparing their global feature vectors which misses sufficient information and lacks interpretability, detecting objects in images or videos and aligning the text with fine-grained features which relies on complicated model designs, or modeling fine-grained interaction via cross-attention upon visual and textual tokens which suffers from inferior efficiency.

Retrieval Text Retrieval +1

LaT: Latent Translation with Cycle-Consistency for Video-Text Retrieval

no code implementations11 Jul 2022 Jinbin Bai, Chunhui Liu, Feiyue Ni, Haofan Wang, Mengying Hu, Xiaofeng Guo, Lele Cheng

To overcome the above issue, we present a novel mechanism for learning the translation relationship from a source modality space $\mathcal{S}$ to a target modality space $\mathcal{T}$ without the need for a joint latent space, which bridges the gap between visual and textual domains.

Representation Learning Retrieval +4

Learning from Large-scale Noisy Web Data with Ubiquitous Reweighting for Image Classification

no code implementations2 Nov 2018 Jia Li, Yafei Song, Jianfeng Zhu, Lele Cheng, Ying Su, Lin Ye, Pengcheng Yuan, Shumin Han

In this manner, the influence of bias and noise in the web data can be gradually alleviated, leading to the steadily improving performance of URNet.

General Classification Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.