Search Results for author: Min Cao

Found 17 papers, 5 papers with code

On the Evaluation and Refinement of Vision-Language Instruction Tuning Datasets

no code implementations10 Oct 2023 Ning Liao, Shaofeng Zhang, Renqiu Xia, Min Cao, Yu Qiao, Junchi Yan

Instead of evaluating the models directly, in this paper, we try to evaluate the Vision-Language Instruction-Tuning (VLIT) datasets.

Benchmarking

An Empirical Study of CLIP for Text-based Person Search

1 code implementation19 Aug 2023 Min Cao, Yang Bai, Ziyin Zeng, Mang Ye, Min Zhang

TPBS, as a fine-grained cross-modal retrieval task, is also facing the rise of research on the CLIP-based TBPS.

Cross-Modal Retrieval Data Augmentation +5

RaSa: Relation and Sensitivity Aware Representation Learning for Text-based Person Search

1 code implementation23 May 2023 Yang Bai, Min Cao, Daming Gao, Ziqiang Cao, Chen Chen, Zhenfeng Fan, Liqiang Nie, Min Zhang

RA offsets the overfitting risk by introducing a novel positive relation detection task (i. e., learning to distinguish strong and weak positive pairs).

Person Search Relation +2

Text-based Person Search without Parallel Image-Text Data

no code implementations22 May 2023 Yang Bai, Jingyao Wang, Min Cao, Chen Chen, Ziqiang Cao, Liqiang Nie, Min Zhang

Text-based person search (TBPS) aims to retrieve the images of the target person from a large image gallery based on a given natural language description.

Image Captioning Language Modelling +4

Efficient Image-Text Retrieval via Keyword-Guided Pre-Screening

no code implementations14 Mar 2023 Min Cao, Yang Bai, Jingyao Wang, Ziqiang Cao, Liqiang Nie, Min Zhang

The proposed framework equipped with only two embedding layers achieves $O(1)$ querying time complexity, while improving the retrieval efficiency and keeping its performance, when applied prior to the common image-text retrieval methods.

Multi-Label Classification Multi-Task Learning +2

M-Tuning: Prompt Tuning with Mitigated Label Bias in Open-Set Scenarios

no code implementations9 Mar 2023 Ning Liao, Xiaopeng Zhang, Min Cao, Junchi Yan, Qi Tian

In realistic open-set scenarios where labels of a part of testing data are totally unknown, when vision-language (VL) prompt learning methods encounter inputs related to unknown classes (i. e., not seen during training), they always predict them as one of the training classes.

Open Set Learning

Rethinking Visual Prompt Learning as Masked Visual Token Modeling

no code implementations9 Mar 2023 Ning Liao, Bowen Shi, Xiaopeng Zhang, Min Cao, Junchi Yan, Qi Tian

To explore prompt learning on the generative pre-trained visual model, as well as keeping the task consistency, we propose Visual Prompt learning as masked visual Token Modeling (VPTM) to transform the downstream visual classification into the pre-trained masked visual token prediction.

End-to-End Context-Aided Unicity Matching for Person Re-identification

no code implementations20 Oct 2022 Min Cao, Cong Ding, Chen Chen, Junchi Yan, Qi Tian

Based on a natural assumption that images belonging to the same person identity should not match with images belonging to multiple different person identities across views, called the unicity of person matching on the identity level, we propose an end-to-end person unicity matching architecture for learning and refining the person matching relations.

Graph Matching Person Re-Identification

Visual Subtitle Feature Enhanced Video Outline Generation

no code implementations24 Aug 2022 Qi Lv, Ziqiang Cao, Wenrui Xie, Derui Wang, Jingwen Wang, Zhiwei Hu, Tangkun Zhang, Ba Yuan, Yuanhang Li, Min Cao, Wenjie Li, Sujian Li, Guohong Fu

Furthermore, based on the similarity between video outlines and textual outlines, we use a large number of articles with chapter headings to pretrain our model.

Headline Generation Navigate +4

Revising Image-Text Retrieval via Multi-Modal Entailment

no code implementations22 Aug 2022 Xu Yan, Chunhui Ai, Ziqiang Cao, Min Cao, Sujian Li, Wenjie Li, Guohong Fu

While the builders of existing image-text retrieval datasets strive to ensure that the caption matches the linked image, they cannot prevent a caption from fitting other images.

Natural Language Inference Retrieval +2

Image-text Retrieval: A Survey on Recent Research and Development

no code implementations28 Mar 2022 Min Cao, Shiping Li, Juntao Li, Liqiang Nie, Min Zhang

On top of this, the efficiency-focused study on the ITR system is introduced as the third perspective.

Retrieval Text Retrieval

Learning Semantic-Aligned Feature Representation for Text-based Person Search

1 code implementation13 Dec 2021 Shiping Li, Min Cao, Min Zhang

In this paper, we propose a semantic-aligned embedding method for text-based person search, in which the feature alignment across modalities is achieved by automatically learning the semantic-aligned visual features and textual features.

Person Search Text based Person Retrieval +1

Progressive Bilateral-Context Driven Model for Post-Processing Person Re-Identification

1 code implementation7 Sep 2020 Min Cao, Chen Chen, Hao Dou, Xiyuan Hu, Silong Peng, Arjan Kuijper

Most existing person re-identification methods compute pairwise similarity by extracting robust visual features and learning the discriminative metric.

Large-Scale Person Re-Identification

Key Person Aided Re-identification in Partially Ordered Pedestrian Set

no code implementations25 May 2018 Chen Chen, Min Cao, Xiyuan Hu, Silong Peng

Ideally person re-identification seeks for perfect feature representation and metric model that re-identify all various pedestrians well in non-overlapping views at different locations with different camera configurations, which is very challenging.

Person Re-Identification

Recognition of convolutional neural network based on CUDA Technology

no code implementations30 May 2015 Yi-bin Huang, Kang Li, Ge Wang, Min Cao, Pin Li, Yu-jia Zhang

For the problem whether Graphic Processing Unit(GPU), the stream processor with high performance of floating-point computing is applicable to neural networks, this paper proposes the parallel recognition algorithm of Convolutional Neural Networks(CNNs). It adopts Compute Unified Device Architecture(CUDA)technology, definite the parallel data structures, and describes the mapping mechanism for computing tasks on CUDA.

Cannot find the paper you are looking for? You can Submit a new open access paper.