1 code implementation • 21 Jul 2024 • Yiyang Jiang, WengYu Zhang, Xulu Zhang, XiaoYong Wei, Chang Wen Chen, Qing Li
Through a feasibility study, we demonstrate that LLM encoders effectively refine inter-concept relations in multimodal embeddings, even without being trained on textual embeddings.
Ranked #6 on Natural Language Moment Retrieval on TACoS
no code implementations • 1 Jun 2023 • Xiao Dong, Runhui Huang, XiaoYong Wei, Zequn Jie, Jianxing Yu, Jian Yin, Xiaodan Liang
Recent advances in vision-language pre-training have enabled machines to perform better in multimodal object discrimination (e. g., image-text semantic alignment) and image synthesis (e. g., text-to-image generation).
no code implementations • 17 Jun 2022 • Xiao Dong, Xunlin Zhan, Yunchao Wei, XiaoYong Wei, YaoWei Wang, Minlong Lu, Xiaochun Cao, Xiaodan Liang
Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.
no code implementations • 28 Jan 2022 • Xulu Zhang, Zhenqun Yang, Hao Tian, Qing Li, XiaoYong Wei
In many applications, we need the matching evidence to be indicated rather than just have the ranked list (e. g., the locations of the target proteins/cells/lesions in medical images).
no code implementations • 10 Oct 2021 • Zhangqiang Ming, Min Zhu, Xiangkun Wang, Jiamin Zhu, Junlong Cheng, Chengrui Gao, Yong Yang, XiaoYong Wei
In recent years, with the increasing demand for public safety and the rapid development of intelligent surveillance networks, person re-identification (Re-ID) has become one of the hot research topics in the computer vision field.
no code implementations • 13 Sep 2021 • Zhangqiang Ming, Yong Yang, XiaoYong Wei, Jianrong Yan, Xiangkun Wang, Fengjie Wang, Min Zhu
To solve these problems, we propose a simple and efficient Local Sliding Alignment (LSA) strategy to dynamically align the local features of two images by setting a sliding window on the local stripes of the pedestrian.
no code implementations • CVPR 2022 • Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, XiaoYong Wei, Minlong Lu, YaoWei Wang, Xiaodan Liang
Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets.