no code implementations • 13 Aug 2022 • Zhenshan Tan, Cheng Chen, Keyu Wen, Yuzhuo Qin, Xiaodong Gu
With the design of negative samples, the noise objects are suppressed.
no code implementations • 2 Jul 2022 • Keyu Wen, Zhenshan Tan, Qingrong Cheng, Cheng Chen, Xiaodong Gu
Concretely, the first module is a weight-sharing transformer that builds on the head of the visual and textual encoders, aiming to semantically align text and image.
no code implementations • CVPR 2022 • Cheng Chen, Yudong Zhu, Zhenshan Tan, Qingrong Cheng, Xin Jiang, Qun Liu, Xiaodong Gu
In this paper, we propose a contrastive learning-based framework UTC to unify and facilitate both discriminative and generative tasks in visual dialog with a single model.