no code implementations • 5 Dec 2024 • Xinghui Li, Qichao Sun, Pengze Zhang, Fulong Ye, Zhichao Liao, Wanquan Feng, Songtao Zhao, Qian He
Additionally, we introduce a Garment-Enhanced Texture Learning strategy to improve the fine-grained texture details of garments.
1 code implementation • 19 Aug 2023 • Fulong Ye, Yuxing Long, Fangxiang Feng, Xiaojie Wang
Referring Expression Generation (REG) aims to generate unambiguous Referring Expressions (REs) for objects in a visual scene, with a dual task of Referring Expression Comprehension (REC) to locate the referred object.
1 code implementation • 19 Aug 2023 • Fulong Ye, Guang Liu, Xinya Wu, Ledell Wu
Specifically, we first train a multilingual text encoder based on the knowledge distillation.
1 code implementation • 5 Jan 2023 • Yuxing Long, Binyuan Hui, Fulong Ye, Yanyang Li, Zhuoxin Han, Caixia Yuan, Yongbin Li, Xiaojie Wang
Existing multimodal conversation agents have shown impressive abilities to locate absolute positions or retrieve attributes in simple scenarios, but they fail to perform well when complex relative positions and information alignments are involved, which poses a bottleneck in response quality.
2 code implementations • 12 Nov 2022 • Zhongzhi Chen, Guang Liu, Bo-Wen Zhang, Fulong Ye, Qinghong Yang, Ledell Wu
In this work, we present a conceptually simple and effective method to train a strong bilingual/multilingual multimodal representation model.