1 code implementation • 14 Oct 2024 • Weiwei Sun, Zhengliang Shi, Jiulong Wu, Lingyong Yan, Xinyu Ma, Yiding Liu, Min Cao, Dawei Yin, Zhaochun Ren
Recent information retrieval (IR) models are pre-trained and instruction-tuned on massive datasets and tasks, enabling them to perform well on a wide range of tasks and potentially generalize to unseen tasks with instructions.
no code implementations • 28 Apr 2024 • Daming Gao, Yang Bai, Min Cao, Hao Dou, Mang Ye, Min Zhang
Text-based person search (TBPS) aims to retrieve images of a specific person from a large image gallery based on a natural language description.
no code implementations • 1 Nov 2023 • Mengxia Wu, Min Cao, Yang Bai, Ziyin Zeng, Chen Chen, Liqiang Nie, Min Zhang
In this paper, we make the first empirical study of frame selection for TVR.
no code implementations • 10 Oct 2023 • Ning Liao, Shaofeng Zhang, Renqiu Xia, Min Cao, Yu Qiao, Junchi Yan
Instead of evaluating the models directly, in this paper, we try to evaluate the Vision-Language Instruction-Tuning (VLIT) datasets.
1 code implementation • ICCV 2023 • Zhenfeng Fan, Zhiheng Zhang, Shuang Yang, Chongyang Zhong, Min Cao, Shihong Xia
We propose a learning framework for 3D facial attribute translation to relieve these limitations.
1 code implementation • 19 Aug 2023 • Min Cao, Yang Bai, Ziyin Zeng, Mang Ye, Min Zhang
TPBS, as a fine-grained cross-modal retrieval task, is also facing the rise of research on the CLIP-based TBPS.
Ranked #6 on
Text based Person Retrieval
on RSTPReid
1 code implementation • 23 May 2023 • Yang Bai, Min Cao, Daming Gao, Ziqiang Cao, Chen Chen, Zhenfeng Fan, Liqiang Nie, Min Zhang
RA offsets the overfitting risk by introducing a novel positive relation detection task (i. e., learning to distinguish strong and weak positive pairs).
Ranked #3 on
Text based Person Retrieval
on RSTPReid
no code implementations • 22 May 2023 • Yang Bai, Jingyao Wang, Min Cao, Chen Chen, Ziqiang Cao, Liqiang Nie, Min Zhang
Text-based person search (TBPS) aims to retrieve the images of the target person from a large image gallery based on a given natural language description.
no code implementations • 14 Mar 2023 • Min Cao, Yang Bai, Jingyao Wang, Ziqiang Cao, Liqiang Nie, Min Zhang
The proposed framework equipped with only two embedding layers achieves $O(1)$ querying time complexity, while improving the retrieval efficiency and keeping its performance, when applied prior to the common image-text retrieval methods.
no code implementations • 9 Mar 2023 • Ning Liao, Bowen Shi, Xiaopeng Zhang, Min Cao, Junchi Yan, Qi Tian
To explore prompt learning on the generative pre-trained visual model, as well as keeping the task consistency, we propose Visual Prompt learning as masked visual Token Modeling (VPTM) to transform the downstream visual classification into the pre-trained masked visual token prediction.
no code implementations • 9 Mar 2023 • Ning Liao, Xiaopeng Zhang, Min Cao, Junchi Yan
In realistic open-set scenarios where labels of a part of testing data are totally unknown, when vision-language (VL) prompt learning methods encounter inputs related to unknown classes (i. e., not seen during training), they always predict them as one of the training classes.
no code implementations • 20 Oct 2022 • Min Cao, Cong Ding, Chen Chen, Junchi Yan, Qi Tian
Based on a natural assumption that images belonging to the same person identity should not match with images belonging to multiple different person identities across views, called the unicity of person matching on the identity level, we propose an end-to-end person unicity matching architecture for learning and refining the person matching relations.
no code implementations • 24 Aug 2022 • Qi Lv, Ziqiang Cao, Wenrui Xie, Derui Wang, Jingwen Wang, Zhiwei Hu, Tangkun Zhang, Ba Yuan, Yuanhang Li, Min Cao, Wenjie Li, Sujian Li, Guohong Fu
Furthermore, based on the similarity between video outlines and textual outlines, we use a large number of articles with chapter headings to pretrain our model.
no code implementations • 22 Aug 2022 • Xu Yan, Chunhui Ai, Ziqiang Cao, Min Cao, Sujian Li, Wenjie Li, Guohong Fu
While the builders of existing image-text retrieval datasets strive to ensure that the caption matches the linked image, they cannot prevent a caption from fitting other images.
no code implementations • 28 Mar 2022 • Min Cao, Shiping Li, Juntao Li, Liqiang Nie, Min Zhang
On top of this, the efficiency-focused study on the ITR system is introduced as the third perspective.
1 code implementation • 13 Dec 2021 • Shiping Li, Min Cao, Min Zhang
In this paper, we propose a semantic-aligned embedding method for text-based person search, in which the feature alignment across modalities is achieved by automatically learning the semantic-aligned visual features and textual features.
Ranked #11 on
Text based Person Retrieval
on CUHK-PEDES
1 code implementation • 7 Sep 2020 • Min Cao, Chen Chen, Hao Dou, Xiyuan Hu, Silong Peng, Arjan Kuijper
Most existing person re-identification methods compute pairwise similarity by extracting robust visual features and learning the discriminative metric.
no code implementations • 25 May 2018 • Chen Chen, Min Cao, Xiyuan Hu, Silong Peng
Ideally person re-identification seeks for perfect feature representation and metric model that re-identify all various pedestrians well in non-overlapping views at different locations with different camera configurations, which is very challenging.
no code implementations • 30 May 2015 • Yi-bin Huang, Kang Li, Ge Wang, Min Cao, Pin Li, Yu-jia Zhang
For the problem whether Graphic Processing Unit(GPU), the stream processor with high performance of floating-point computing is applicable to neural networks, this paper proposes the parallel recognition algorithm of Convolutional Neural Networks(CNNs). It adopts Compute Unified Device Architecture(CUDA)technology, definite the parallel data structures, and describes the mapping mechanism for computing tasks on CUDA.