Search Results for author: Qingpei Guo

Found 15 papers, 9 papers with code

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval

no code implementations • 22 Apr 2024 • Xuzheng Yu, Chen Jiang, Xingning Dong, Tian Gan, Ming Yang, Qingpei Guo

In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from a vast video corpus, is an essential function, the primary challenge of which is to bridge the modality gap.

Retrieval Video Retrieval

Paper
Add Code

SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks

1 code implementation • 31 Jan 2024 • Xingning Dong, Qingpei Guo, Tian Gan, Qing Wang, Jianlong Wu, Xiangyuan Ren, Yuan Cheng, Wei Chu

By employing one shared BERT-type network to refine textual and cross-modal features simultaneously, SNP is lightweight and could support various downstream applications.

Sentence

Paper
Code

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval

1 code implementation • 31 Jan 2024 • Xingning Dong, Zipeng Feng, Chunluan Zhou, Xuzheng Yu, Ming Yang, Qingpei Guo

We then summarize this empirical study into the M2-RAAP recipe, where our technical contributions lie in 1) the data filtering and text re-writing pipeline resulting in 1M high-quality bilingual video-text pairs, 2) the replacement of video inputs with key-frames to accelerate pre-training, and 3) the Auxiliary-Caption-Guided (ACG) strategy to enhance video features.

Retrieval Text Retrieval +1

Paper
Code

M2-Encoder: Advancing Bilingual Image-Text Understanding by Large-scale Efficient Pretraining

1 code implementation • 29 Jan 2024 • Qingpei Guo, Furong Xu, Hanxiao Zhang, Wang Ren, Ziping Ma, Lin Ju, Jian Wang, Jingdong Chen, Ming Yang

Vision-language foundation models like CLIP have revolutionized the field of artificial intelligence.

Ranked #1 on Zero-shot Image Retrieval on Flickr30k-CN (using extra training data)

Zero-Shot Cross-Modal Retrieval Zero-shot Image Retrieval +3

Paper
Code

Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition

no code implementations • 9 Jan 2024 • Xuzheng Yu, Chen Jiang, Wei zhang, Tian Gan, Linlin Chao, Jianan Zhao, Yuan Cheng, Qingpei Guo, Wei Chu

With the explosive growth of video data in real-world applications, a comprehensive representation of videos becomes increasingly important.

Representation Learning Scene Recognition

Paper
Add Code

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

no code implementations • 4 Jan 2024 • Ziping Ma, Furong Xu, Jian Liu, Ming Yang, Qingpei Guo

To achieve multimodal alignment from both global and local perspectives, this paper proposes Symmetrizing Contrastive Captioners (SyCoCa), which introduces bidirectional interactions on images and texts across the global and local representation levels.

Image Captioning Image Classification +6

Paper
Add Code

Text as Image: Learning Transferable Adapter for Multi-Label Classification

no code implementations • 7 Dec 2023 • Xuelin Zhu, Jiuxin Cao, Jian Liu, Dongqi Tang, Furong Xu, Weijia Liu, Jiawei Ge, Bo Liu, Qingpei Guo, Tianyi Zhang

Pre-trained vision-language models have notably accelerated progress of open-world concept recognition.

Instruction Following Multi-Label Classification +2

Paper
Add Code

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

2 code implementations • 1 Oct 2023 • Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang

Specifically, we present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.

Referring Expression

Paper
Code

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning

1 code implementation • 20 Sep 2023 • Chen Jiang, Hong Liu, Xuzheng Yu, Qing Wang, Yuan Cheng, Jia Xu, Zhongyi Liu, Qingpei Guo, Wei Chu, Ming Yang, Yuan Qi

We thereby present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to construct partial order triplet samples by automatically generating fine-grained hard negatives for matched text-video pairs.

Ranked #4 on Video Retrieval on MSR-VTT-1kA

Contrastive Learning Retrieval +3

Paper
Code

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

1 code implementation • 21 Aug 2023 • Yutao Chen, Xingning Dong, Tian Gan, Chunluan Zhou, Ming Yang, Qingpei Guo

Compared with images, we conjecture that videos necessitate more constraints to preserve the temporal consistency during editing.

Video Editing

Paper
Code

Temporal Sentence Grounding in Streaming Videos

1 code implementation • 14 Aug 2023 • Tian Gan, Xiao Wang, Yan Sun, Jianlong Wu, Qingpei Guo, Liqiang Nie

The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query.

Sentence Temporal Sentence Grounding

Paper
Code

Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input

no code implementations • 25 Jun 2023 • Qingpei Guo, Kaisheng Yao, Wei Chu

They can achieve exceptional performances on specific tasks, but face a particularly challenging problem of modality mismatch because of diversity of input modalities and their fixed structures.

Question Answering Referring Expression +5

Paper
Add Code

Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval

1 code implementation • CVPR 2023 • Tan Pan, Furong Xu, Xudong Yang, Sifeng He, Chen Jiang, Qingpei Guo, Feng Qian Xiaobo Zhang, Yuan Cheng, Lei Yang, Wei Chu

For traditional model upgrades, the old model will not be replaced by the new one until the embeddings of all the images in the database are re-computed by the new model, which takes days or weeks for a large amount of data.

Image Retrieval Retrieval

Paper
Code

CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset

1 code implementation • CVPR 2023 • Tian Gan, Qing Wang, Xingning Dong, Xiangyuan Ren, Liqiang Nie, Qingpei Guo

Though there are certain methods studying the Chinese video-text pre-training, they pre-train their models on private datasets whose videos and text are unavailable.

Paper
Code

LPSNet: A Lightweight Solution for Fast Panoptic Segmentation

no code implementations • CVPR 2021 • Weixiang Hong, Qingpei Guo, Wei zhang, Jingdong Chen, Wei Chu

Panoptic segmentation is a challenging task aiming to simultaneously segment objects (things) at instance level and background contents (stuff) at semantic level.

Instance Segmentation Panoptic Segmentation +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.