Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval

1 code implementation21 Jul 2024 Yiyang Jiang, WengYu Zhang, Xulu Zhang, XiaoYong Wei, Chang Wen Chen, Qing Li

Through a feasibility study, we demonstrate that LLM encoders effectively refine inter-concept relations in multimodal embeddings, even without being trained on textual embeddings.

General Knowledge Highlight Detection +4

UniDiff: Advancing Vision-Language Models with Generative and Discriminative Learning

no code implementations1 Jun 2023 Xiao Dong, Runhui Huang, XiaoYong Wei, Zequn Jie, Jianxing Yu, Jian Yin, Xiaodan Liang

Recent advances in vision-language pre-training have enabled machines to perform better in multimodal object discrimination (e. g., image-text semantic alignment) and image synthesis (e. g., text-to-image generation).

Contrastive Learning Retrieval +1

Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval

no code implementations17 Jun 2022 Xiao Dong, Xunlin Zhan, Yunchao Wei, XiaoYong Wei, YaoWei Wang, Minlong Lu, Xiaochun Cao, Xiaodan Liang

Our goal in this research is to study a more realistic environment in which we can conduct weakly-supervised multi-modal instance-level product retrieval for fine-grained product categories.


Indicative Image Retrieval: Turning Blackbox Learning into Grey

no code implementations28 Jan 2022 Xulu Zhang, Zhenqun Yang, Hao Tian, Qing Li, XiaoYong Wei

In many applications, we need the matching evidence to be indicated rather than just have the ranked list (e. g., the locations of the target proteins/cells/lesions in medical images).

Image Retrieval Representation Learning +1

Deep learning-based person re-identification methods: A survey and outlook of recent works

no code implementations10 Oct 2021 Zhangqiang Ming, Min Zhu, Xiangkun Wang, Jiamin Zhu, Junlong Cheng, Chengrui Gao, Yong Yang, XiaoYong Wei

In recent years, with the increasing demand for public safety and the rapid development of intelligent surveillance networks, person re-identification (Re-ID) has become one of the hot research topics in the computer vision field.

Deep Learning Metric Learning +2

Global-Local Dynamic Feature Alignment Network for Person Re-Identification

no code implementations13 Sep 2021 Zhangqiang Ming, Yong Yang, XiaoYong Wei, Jianrong Yan, Xiangkun Wang, Fengjie Wang, Min Zhu

To solve these problems, we propose a simple and efficient Local Sliding Alignment (LSA) strategy to dynamically align the local features of two images by setting a sliding window on the local stripes of the pedestrian.

Pedestrian Detection Person Re-Identification

M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining

no code implementations CVPR 2022 Xiao Dong, Xunlin Zhan, Yangxin Wu, Yunchao Wei, Michael C. Kampffmeyer, XiaoYong Wei, Minlong Lu, YaoWei Wang, Xiaodan Liang

Despite the potential of multi-modal pre-training to learn highly discriminative feature representations from complementary data modalities, current progress is being slowed by the lack of large-scale modality-diverse datasets.

Contrastive Learning

