Search Results for author: Qingpei Guo

Found 17 papers, 10 papers with code

Social Debiasing for Fair Multi-modal LLMs

no code implementations13 Aug 2024 Harry Cheng, Yangyang Guo, Qingpei Guo, Ming Yang, Tian Gan, Liqiang Nie

Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities.

counterfactual

Hummer: Towards Limited Competitive Preference Dataset

no code implementations19 May 2024 Li Jiang, Yusen Wu, Junwu Xiong, Jingqing Ruan, Yichuan Ding, Qingpei Guo, Zujie Wen, Jun Zhou, Xiaotie Deng

Preference datasets are essential for incorporating human preferences into pre-trained language models, playing a key role in the success of Reinforcement Learning from Human Feedback.

SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval

no code implementations22 Apr 2024 Xuzheng Yu, Chen Jiang, Xingning Dong, Tian Gan, Ming Yang, Qingpei Guo

In particular, text-video retrieval, which aims to find the top matching videos given text descriptions from a vast video corpus, is an essential function, the primary challenge of which is to bridge the modality gap.

Retrieval Video Retrieval

M2-RAAP: A Multi-Modal Recipe for Advancing Adaptation-based Pre-training towards Effective and Efficient Zero-shot Video-text Retrieval

1 code implementation31 Jan 2024 Xingning Dong, Zipeng Feng, Chunluan Zhou, Xuzheng Yu, Ming Yang, Qingpei Guo

We then summarize this empirical study into the M2-RAAP recipe, where our technical contributions lie in 1) the data filtering and text re-writing pipeline resulting in 1M high-quality bilingual video-text pairs, 2) the replacement of video inputs with key-frames to accelerate pre-training, and 3) the Auxiliary-Caption-Guided (ACG) strategy to enhance video features.

Text Retrieval Video-Text Retrieval

SNP-S3: Shared Network Pre-training and Significant Semantic Strengthening for Various Video-Text Tasks

1 code implementation31 Jan 2024 Xingning Dong, Qingpei Guo, Tian Gan, Qing Wang, Jianlong Wu, Xiangyuan Ren, Yuan Cheng, Wei Chu

By employing one shared BERT-type network to refine textual and cross-modal features simultaneously, SNP is lightweight and could support various downstream applications.

Sentence

Knowledge-enhanced Multi-perspective Video Representation Learning for Scene Recognition

no code implementations9 Jan 2024 Xuzheng Yu, Chen Jiang, Wei zhang, Tian Gan, Linlin Chao, Jianan Zhao, Yuan Cheng, Qingpei Guo, Wei Chu

With the explosive growth of video data in real-world applications, a comprehensive representation of videos becomes increasingly important.

Representation Learning Scene Recognition

SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment

1 code implementation4 Jan 2024 Ziping Ma, Furong Xu, Jian Liu, Ming Yang, Qingpei Guo

To achieve multimodal alignment from both global and local perspectives, this paper proposes Symmetrizing Contrastive Captioners (SyCoCa), which introduces bidirectional interactions on images and texts across the global and local representation levels.

Image Captioning Image Classification +7

Pink: Unveiling the Power of Referential Comprehension for Multi-modal LLMs

2 code implementations CVPR 2024 Shiyu Xuan, Qingpei Guo, Ming Yang, Shiliang Zhang

Specifically, we present a new method for constructing the instruction tuning dataset at a low cost by leveraging annotations in existing datasets.

Referring Expression

Dual-Modal Attention-Enhanced Text-Video Retrieval with Triplet Partial Margin Contrastive Learning

1 code implementation20 Sep 2023 Chen Jiang, Hong Liu, Xuzheng Yu, Qing Wang, Yuan Cheng, Jia Xu, Zhongyi Liu, Qingpei Guo, Wei Chu, Ming Yang, Yuan Qi

We thereby present a new Triplet Partial Margin Contrastive Learning (TPM-CL) module to construct partial order triplet samples by automatically generating fine-grained hard negatives for matched text-video pairs.

Contrastive Learning Retrieval +3

EVE: Efficient zero-shot text-based Video Editing with Depth Map Guidance and Temporal Consistency Constraints

1 code implementation21 Aug 2023 Yutao Chen, Xingning Dong, Tian Gan, Chunluan Zhou, Ming Yang, Qingpei Guo

Compared with images, we conjecture that videos necessitate more constraints to preserve the temporal consistency during editing.

Video Editing

Temporal Sentence Grounding in Streaming Videos

1 code implementation14 Aug 2023 Tian Gan, Xiao Wang, Yan Sun, Jianlong Wu, Qingpei Guo, Liqiang Nie

The goal of TSGSV is to evaluate the relevance between a video stream and a given sentence query.

Sentence Temporal Sentence Grounding

Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input

no code implementations25 Jun 2023 Qingpei Guo, Kaisheng Yao, Wei Chu

They can achieve exceptional performances on specific tasks, but face a particularly challenging problem of modality mismatch because of diversity of input modalities and their fixed structures.

Diversity Image-text Retrieval +6

Boundary-aware Backward-Compatible Representation via Adversarial Learning in Image Retrieval

1 code implementation CVPR 2023 Tan Pan, Furong Xu, Xudong Yang, Sifeng He, Chen Jiang, Qingpei Guo, Feng Qian Xiaobo Zhang, Yuan Cheng, Lei Yang, Wei Chu

For traditional model upgrades, the old model will not be replaced by the new one until the embeddings of all the images in the database are re-computed by the new model, which takes days or weeks for a large amount of data.

Image Retrieval Retrieval

CNVid-3.5M: Build, Filter, and Pre-Train the Large-Scale Public Chinese Video-Text Dataset

1 code implementation CVPR 2023 Tian Gan, Qing Wang, Xingning Dong, Xiangyuan Ren, Liqiang Nie, Qingpei Guo

Though there are certain methods studying the Chinese video-text pre-training, they pre-train their models on private datasets whose videos and text are unavailable.

LPSNet: A Lightweight Solution for Fast Panoptic Segmentation

no code implementations CVPR 2021 Weixiang Hong, Qingpei Guo, Wei zhang, Jingdong Chen, Wei Chu

Panoptic segmentation is a challenging task aiming to simultaneously segment objects (things) at instance level and background contents (stuff) at semantic level.

Instance Segmentation Panoptic Segmentation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.