Search Results for author: Dian Li

Found 11 papers, 7 papers with code

Vision-Language Instruction Tuning: A Review and Analysis

1 code implementation • 14 Nov 2023 • Chen Li, Yixiao Ge, Dian Li, Ying Shan

Instruction tuning is a crucial supervised training phase in Large Language Models (LLMs), aiming to enhance the LLM's ability to generalize instruction execution and adapt to user preferences.

Paper
Code

TagGPT: Large Language Models are Zero-shot Multimodal Taggers

1 code implementation • 6 Apr 2023 • Chen Li, Yixiao Ge, Jiayong Mao, Dian Li, Ying Shan

Given a new entity that needs tagging for distribution, TagGPT introduces two alternative options for zero-shot tagging, i. e., a generative method with late semantic matching with the tag set, and another selective method with early matching in prompts.

Optical Character Recognition (OCR) Prompt Engineering +5

Paper
Code

RILS: Masked Visual Reconstruction in Language Semantic Space

1 code implementation • CVPR 2023 • Shusheng Yang, Yixiao Ge, Kun Yi, Dian Li, Ying Shan, XiaoHu Qie, Xinggang Wang

Both masked image modeling (MIM) and natural language supervision have facilitated the progress of transferable visual pre-training.

Sentence

Paper
Code

Masked Image Modeling with Denoising Contrast

1 code implementation • 19 May 2022 • Kun Yi, Yixiao Ge, Xiaotong Li, Shusheng Yang, Dian Li, Jianping Wu, Ying Shan, XiaoHu Qie

Since the development of self-supervised visual representation learning from contrastive learning to masked image modeling (MIM), there is no significant difference in essence, that is, how to design proper pretext tasks for vision dictionary look-up.

Contrastive Learning Denoising +6

Paper
Code

Controllable Augmentations for Video Representation Learning

no code implementations • 30 Mar 2022 • Rui Qian, Weiyao Lin, John See, Dian Li

The major reason is that the positive pairs, i. e., different clips sampled from the same video, have limited temporal receptive field, and usually share similar background but differ in motions.

Action Recognition Contrastive Learning +3

Paper
Add Code

Bridging Video-text Retrieval with Multiple Choice Questions

2 code implementations • CVPR 2022 • Yuying Ge, Yixiao Ge, Xihui Liu, Dian Li, Ying Shan, XiaoHu Qie, Ping Luo

As an additional benefit, our method achieves competitive results with much shorter pre-training videos on single-modality downstream tasks, e. g., action recognition with linear evaluation.

Ranked #8 on Zero-Shot Video Retrieval on MSVD

Action Recognition Multiple-choice +8

2,959

Paper
Code

Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation

no code implementations • CVPR 2022 • Zhaoyang Zeng, Yongsheng Luo, Zhenhua Liu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen

In this paper, we propose the Tencent-MVSE dataset, which is the first benchmark dataset for the multi-modal video similarity evaluation task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

1 code implementation • 9 Dec 2021 • Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.

object-detection Object Detection +2

656

Paper
Code

CLIP4Caption: CLIP for Video Caption

no code implementations • 13 Oct 2021 • Mingkang Tang, Zhanyu Wang, Zhenhua Liu, Fengyun Rao, Dian Li, Xiu Li

It is noted that our model is only trained on the MSR-VTT dataset.

Sentence Text Generation +3

Paper
Add Code

CLIP4Caption ++: Multi-CLIP for Video Caption

no code implementations • 11 Oct 2021 • Mingkang Tang, Zhanyu Wang, Zhaoyang Zeng, Fengyun Rao, Dian Li

We make the following improvements on the proposed CLIP4Caption++: We employ an advanced encoder-decoder model architecture X-Transformer as our main framework and make the following improvements: 1) we utilize three strong pre-trained CLIP models to extract the text-related appearance visual features.

Sentence

Paper
Add Code

Enhancing Self-supervised Video Representation Learning via Multi-level Feature Optimization

1 code implementation • ICCV 2021 • Rui Qian, Yuxi Li, Huabin Liu, John See, Shuangrui Ding, Xian Liu, Dian Li, Weiyao Lin

The crux of self-supervised video representation learning is to build general features from unlabeled videos.

Contrastive Learning Representation Learning +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.