no code implementations • 20 Aug 2022 • Qingrong Cheng, Keyu Wen, Xiaodong Gu
To address this issue, we propose a novel Vision-Language Matching strategy for text-to-image synthesis, named VLMGAN*, which introduces a dual vision-language matching mechanism to strengthen the image quality and semantic consistency.
no code implementations • 2 Jul 2022 • Keyu Wen, Zhenshan Tan, Qingrong Cheng, Cheng Chen, Xiaodong Gu
Concretely, the first module is a weight-sharing transformer that builds on the head of the visual and textual encoders, aiming to semantically align text and image.
no code implementations • CVPR 2022 • Cheng Chen, Yudong Zhu, Zhenshan Tan, Qingrong Cheng, Xin Jiang, Qun Liu, Xiaodong Gu
In this paper, we propose a contrastive learning-based framework UTC to unify and facilitate both discriminative and generative tasks in visual dialog with a single model.
1 code implementation • 22 Oct 2020 • Keyu Wen, Xiaodong Gu, Qingrong Cheng
Thus, a novel multi-level semantic relations enhancement approach named Dual Semantic Relations Attention Network(DSRAN) is proposed which mainly consists of two modules, separate semantic relations module and the joint semantic relations module.