no code implementations • 10 May 2025 • Jiayang Liu, Siyuan Liang, Shiqian Zhao, RongCheng Tu, Wenbo Zhou, Xiaochun Cao, DaCheng Tao, Siew Kei Lam
Our approach formulates the prompt generation task as an optimization problem with three key objectives: (1) maximizing the semantic similarity between the input and generated prompts, (2) ensuring that the generated prompts can evade the safety filter of the text-to-video model, and (3) maximizing the semantic similarity between the generated videos and the original input prompts.
no code implementations • 22 Apr 2025 • Siyuan Liang, Jiayang Liu, Jiecheng Zhai, Tianmeng Fang, RongCheng Tu, Aishan Liu, Xiaochun Cao, DaCheng Tao
The rapid development of generative artificial intelligence has made text to video models essential for building future multimodal world simulators.
1 code implementation • 27 Mar 2025 • Junyu Luo, Weizhi Zhang, Ye Yuan, Yusheng Zhao, Junwei Yang, Yiyang Gu, Bohan Wu, Binqi Chen, Ziyue Qiao, Qingqing Long, RongCheng Tu, Xiao Luo, Wei Ju, Zhiping Xiao, Yifan Wang, Meng Xiao, Chenwu Liu, Jingyang Yuan, Shichang Zhang, Yiqiao Jin, Fan Zhang, Xian Wu, Hanqing Zhao, DaCheng Tao, Philip S. Yu, Ming Zhang
The era of intelligent agents is upon us, driven by revolutionary advancements in large language models.
1 code implementation • 20 Jun 2024 • Xincheng Shuai, Henghui Ding, Xingjun Ma, RongCheng Tu, Yu-Gang Jiang, DaCheng Tao
Image editing aims to edit the given synthetic or real image to meet the specific requirements from users.
1 code implementation • CVPR 2023 • Yatai Ji, RongCheng Tu, Jie Jiang, Weijie Kong, Chengfei Cai, Wenzhe Zhao, Hongfa Wang, Yujiu Yang, Wei Liu
Cross-modal alignment is essential for vision-language pre-training (VLP) models to learn the correct corresponding information across different modalities.
Ranked #10 on
Zero-Shot Video Retrieval
on LSMDC
1 code implementation • 4 Jul 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Rui Yan, Eric Zhongcong Xu, RongCheng Tu, Yanru Zhu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Wei Liu, Mike Zheng Shou
In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for the EPIC-KITCHENS-100 Multi-Instance Retrieval (MIR) challenge.
1 code implementation • 4 Jul 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou
In this report, we propose a video-language pretraining (VLP) based solution \cite{kevin2022egovlp} for four Ego4D challenge tasks, including Natural Language Query (NLQ), Moment Query (MQ), Object State Change Classification (OSCC), and PNR Localization (PNR).
2 code implementations • 3 Jun 2022 • Kevin Qinghong Lin, Alex Jinpeng Wang, Mattia Soldan, Michael Wray, Rui Yan, Eric Zhongcong Xu, Difei Gao, RongCheng Tu, Wenzhe Zhao, Weijie Kong, Chengfei Cai, Hongfa Wang, Dima Damen, Bernard Ghanem, Wei Liu, Mike Zheng Shou
Video-Language Pretraining (VLP), which aims to learn transferable representation to advance a wide range of video-text downstream tasks, has recently received increasing attention.